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2 (57) Abstract: Genes and proteins involved in the biosynthesis of macrolides by microorganisnis. in particular the nucleic acids 
foiming the biosynthetic locus for the 16-niember maciolide rosaramicin from Micromonospora carbonacea. These nucleic acids 
^ can be used to make expression constructs and transformed host cells for the production of rosaramicin. The genes and proteins 
^ allow direct manipulation of macrolides and related chemical structures via chemical engineering of the proteins involved in the 
^ biosynthesis of rosaramicin. 
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TITLE OF INVENTION: Genes and proteins for the biosynthesis of rosaramicin 

CROSS-REFERENCING TO RELATED APPLICATION: 

This application claims benefit under 35 USC §1 19 of provisional application 
USSN 60/307.629 filed on July 26. 2001 which Is hereby Incorporated by reference in 
its entirety for all purposes. 

FIELD OF INVENTION: 

The present invention relates to nucleic acid molecules that encode proteins 
that direct the synthesis of macrolides, in particular the 16-member macrolide 
rosaramicin. The present invention also is directed to the use of nucleic acids.and 
proteins to produce compounds exhibiting antibiotic activity based on the rosaramicin 
stnjcture. 

BACKGROUND: 

Rosaramicin is a 16-member macrolide antibiotic. Macrolides consitute a group 
of antibiotics mainly active against Gram-positive bacteria. They have clinical 
applications In the treatment of bacterial infections. Macrolides compounds are 
structurally characterized by a macrolide lactone ring to which one or several deoxy- 
sugars moieties are attached. 



CH3 




The carbohydrate ligands and macrolide lactone ring serve as molecular 
recognition elements critical for biological activity. Variations in the sugar composition 
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of a macrolide or in the structure of the macrolide lactone ring may vary the biological 
activity of the molecule. Elucidation of gene clusters involved in the biosynthesis of 
rosaramicin expands the repertoire of genes and proteins useful to macrolides via 
combinatorial biosynthesis. 

The increasing number of microbial strains that have acquired resistance to the 
cun^ently available antibiotic compounds is recognized as a dangerous threat to public 
health. The genes and proteins involved in the biosynthesis of rosaramicin may be 
used to generate new unnatural compounds having desirable biological activity. The 
genes and proteins from the rosaramicin locus may also be used as probes to identify 

10 new rosaramlcin-like natural products. 

The genome of many microorganisms contains multiple natural product 
blosynthetic loci that are not normally expressed in nature or under conventional 
experimental conditions. For example, twenty-five secondary metabolic gene clusters 
in the genome of the actinomycete Streptpmyces avermitilis were identified by whole 
genome shotgun sequencing of the genome despite the fact that the organism was 
known to produce only two antimicrobial natural products (Osura et aL PNAS, vol. 98, 
no. 21 1221 5-12220). An important new source of antimicrobial compounds lies in the 
products of cryptic biosynthetic loci. It is desirable to discover and characterize a 
biosynthetic locus producing an antimicrobial product and present in the genome of 

20 organisms not known to product the antimicrobial product of the locus. 

SUMMARY OF THE INVENTION: 

Micromonospora carbonacea is known to produce the antimicrobial 
orthosomycin natural product eveminomicin. Micromonospora carbonacea was not 
previously reported to produce other natural products. We have surprisingly 
discovered, in the Micromonospora carbonacea genome, a type I polyketide 
biosynthetic gene cluster directed to the production of a rosaramicin-type polyketide. 

The invention provides polynucleotides and polypeptides useful in the 
production and engineering of macrolides. In one embodiment, the polynucleotide 
30 molecules are selected from the contiguous DNA sequence SEQ ID NO: 1 . Other 
embodiments of the polynucleotides and polypeptides are provided in the 
accompanying sequence listing. SEQ ID NOS: 3, 5, 7. 9. 11, 13, 15. 17, 19,' 21. 23. 
25. 27, 29, 31 , 33. 35, 37, 39 provide nucleic acids responsible for biosynthesis of the 
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16-member macrolide rosaramicin. SEQ ID NOS: 2, 4, 6, 8, 10. 12, 14, 16, 18, 20, 22, 
24, 26, 28, 30, 32, 34, 36, 38 provide amino acid sequences for proteins responsible 
for biosynthesis of the16-member macrolide rosaramicin. Certain embodiments of the 
invention specifically exclude one or more of open reading frames of the rosaramicin 
blosynthetic locus, most notably any one or more of ORFs 3, 1 1, 13, 16, 17 and 18 
(SEQ ID NOS: 7, 23, 27, 33, 35 and 37) and the con-esponding gene products (SEQ 
ID NOS: 6, 22, 26, 32, 34 and 36) deduced therefrom, although other ORFs and 
polypeptides listed in the sequence listing can be excluded from certain embodiments 
without departing from the scope of the invention. 

10 The polynucleotides and polypeptides of the invention provide the machinery for 

producing novel compounds based on the structure of rosaramicin. The invention 
allows direct manipulation of rosaramicin and related chemical structures via chemical 
engineering of the enzymes Involved in the biosynthesis of rosaramicin, modifications 
which may not be presently possible by chemical methodology because of the 
complexity of the stnjctures. The invention can also be used to introduce "chemical 
handles" into nonnally inert positions that pernnit subsequence chemical modifications. 
Several general approaches to achieve the development of novel macrolides are 
facilitated by the methods and compositions of the present invention. For example, 
tylosin is stnjcturally related to rosaramicin but, Unlilce rosaramicin, it does not contain 

20 an epoxide. Accordingly, genes and proteins disclosed herein may be used to 
enzymatically create a tylosin derivative that contains an epoxide modification. 

Various macrolide staictures can be generated by genetic manipulation of the 
rosaramicin gene cluster or use of various genes from the rosaramicin gene cluster in 
accordance with the methods of the invention. The invention can be used to generate 
a focused library.of analogs around a macrolide lead candidate to fine-tune the 
compound for optimal properties. Genetic engineering methods of the invention can 
be directed to modify positions of the molecule previously inert to chemical 
modifications. Known techniques allow one to manipulate a known macrolide gene 
cluster either to produce the macrolide compound synthesized by that gene cluster at 

30 higher levels than occur in nature or In hosts that othenA^se do not produce the 
macrolide. Known techniques allow one to produce molecules that are structurally 
related to, but distinct firom, the macrolide compounds produced from known macrolide 
gene clusters. Cloning, analysis, and manipulation by recombinant DNA technology of 



wo 03/010193 



-4- 



PCT/CA02/01177 



genes that encode rosaramicin gene products can be perfomied according to known 
techniques. 

Thus, in a first aspect the invention provides an isolated, purified or enriched 
nucleic acid comprising a sequence selected from the group consisting of SEQ ID 
NO: 1; the sequences complementary to SEQ ID NO: 1 ; fragments comprising at least 
100, 200, 300, 500. 1000, 2000 or more consecutive nucleotides of SEQ ID NO: 1; and 
fragments comprising at least 100, 200, 300, 500, 1000, 2000 or more consecutive 
nucleotides of the sequences complementary to SEQ ID NO: 1 . Preferred 
embodiments of this aspect include isolated, purified or enriched nucleic acids capable 

10 of hybridizing to the above sequences under conditions of moderate or high stringency; 
isolated, purified or enriched nucleic acid comprising at least 100, 200, 300, 500, 1000, 
2000 or more consecutive bases of the above sequences; and isolated, purified or 
enriched nucleic acid having at least 70%, 75%, 80%, 85%, 90%, 95%, 97% or 99% 
homology to the above sequences as detemilned by analysis with BLASTN version 2.0 
with the default parameters. 

Further embodiments of this aspect of the invention include an isolated, purified 
or enriched nucleic acid comprising a sequence selected from the group consisting of 
SEQ ID NOS: 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37. 39 and 
the sequences complementary thereto; an isolated, purified or enriched nucleic acid 

20 comprising at least 50, 75, 100, 200, 500, 600 or more consecutive bases of a 

sequence selected from the group consisting of SEQ ID NOS: 3, 5, 7, 9, 1 1 , 13, 15, 17, 
19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39 and the sequences complementary thereto; 
and an isolated, purified or enriched nucleic acid capable of hybridizing to the above 
listed nucleic acids under conditions of moderate or high stringency, and isolated, 
purified or enriched nucleic acid having at least 70%, 75%. 80%, 85%, 90%, 95%, 97% 
or 99% homology to the nucleic acid of SEQ ID NOS: 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 1 9, 21 . 
23, 25, 27, 29, 31 , 33, 35, 37, 39 as detennined by analysis with BLASTN version 2.0 
with the default parameters. 

In a second embodlmerit, the invention provides an isolated or purified 

30 polypeptide comprising a sequence selected from the group consisting of SEQ ID 

NOS: 2. 4, 6, 8, 10, 12, 14. 16, 18, 20, 22, 24, 26, 28, 30, 32, 34. 36. 38; an Isolated or 
purified polypeptide comprising at least 50, 75, 100, 200, 300 or more consecutive 
amino acids of the polypeptides of SEQ ID NOS: 2. 4, 6, 8, 10, 12. 14. 16, 18, 20, 22, 
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24, 26, 28, 30, 32, 34, 36, 38; and an isolated or purified polypeptide having at least 
70%, 75%. 80%, 85%, 90%, 95%, 97%, or 99% homology to the polypeptide of SEQ 
ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18. 20. 22, 24, 26, 28. 30. 32, 34. 36. 38 as 
detemiined by analysis with BLASTP vei^sion 2.2.2 with the default parameters. In a 
further aspect, the invention provides a polypeptide comprising one or two or three or 
five or more or the above polypeptide sequences. 

The invention also provides recombinant DNA expression vectors containing the 
above nucleic acids. The polynucleotides and the methods of the invention enable one 
skilled In the art to create recombinant host cells with the ability to produce macrolides. 
Thus, the invention provides a method of preparing a macrolide compound, said 
method comprising transfomiing a heterologous host cell with a recombinant DNA 
vector that encodes at least one of the above nucleic acids, and culturing said host cell 
under conditions such that a macrolide is produced. In one aspect, the method is 
practiced with a Streptomyces host cell. In another aspect, the macrolide produced Is 
rosoramicin. In another aspect, the macrolide produced is a compound related in 
structure to rosaramlcin. The invention also provides a method for producing a 
rosaramicln compound by culturing Micromonospora carbonacea under conditions 
allowing for expression of Its endogenous rosaramicln biosynthetic locus. 

The Invention also encompasses a method of invention for detecting by, in silico 
hybridization or traditional hybridization, putative macrolide gene clusters or macrolide- 
producing microorganisms using compositions of the invention. In one embodiment, a 
polypeptide encoding one or more of the polyketide synthase proteins (SEQ ID NOS: 
10, 12, 14, 16 and 18) or fragments thereof are used as probes to detect putative 
macrolide gene clusters by in silico hybridization. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present Invention will be further understood from the following description 
with reference to the following figures: 

Figure 1 is a block diagram of a computer system which implements and 
executes software tools for the purpose of comparing a query to a subject, wherein the 
subject is selected from the reference sequences of the invention. 

Figures 2A, 2B, 2C and 2D are flow diagrams of a sequence comparison 
software that can be employed for the purpose of comparing a query to a subject. 
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wherein the subject is selected from the reference sequences of the invention, wherein 
Figure 2A is the query initiaiization subprocess of the sequence comparison software, 
Figure 2B is the subject datasource initialization subprocess of the sequence 
' comparison software, Figure 2C illustrates the comparison subprocess and the 
analysis subprocess of the sequence comparison software, and Figure 2D is the 
Display/Report subprocess of the sequence comparison software. 

Figure 3 is a flow diagram of the comparator algorithm (238) of Figure 2C which 
is one embodiment of a comparator algorithm that can be used for painA/ise 
determination of similarity between a query/subject pair. 
10 Figure 4 is a flow diagram of the analyzer algorithm (244) of Figure 2C which is 

one embodiment of an analyzer algorithm that can be used to assign identity to a 
query sequence, based on similarity to a subject sequence, where the subject 
sequence is a reference sequence of the invention- 
Figure 5 is a graphical depiction of the rosaramicin biosynthetic locus showing, 
at the top of the figure, the regions covered by the three deposited cosmid clones 
010CK, 010CF and 010CJ; a scale in kilobase pairs; the positioning of the open 
reading frames on a continuous black line representing the continuous DNA sequence 
(SEQ ID NO: 1); and the relative position and orientation of 19 ORFs referred to by 
number at the bottom of figure. 
20 Figure 6 illustrates the construction of the rosaramicin backbone by the Type 1 

polyketide synthase enzymes (PKS) in the rosaramicin biosynthetic locus. 
Figure 7 illustrates a mechanism for the biosynthesis of rosaramicin. 
Figures 8A and 8B represent a Clustal amino acid alignment of the eight 
ketosynthase (KS) domains found in the rosaramicin PKS enzyme complex. Key 
residues are highlighted. 

Figures 9A and 9B represent a Clustal amino acid alignment of the eight acyl 
transferase (AT) domains in the rosaramicin PKS enzyme complex. Key residues are 
highlighted. Regions important In substrate recognition are indicated by V above the 
alignment. 

30 Figure 10 represents a Clustal amino acid alignment of the 3 DH domains in the 

rosaramicin PKS enzyme complex. Key residues are highlighted. 

Figure 1 1 represents a Clustal amino acid alignment comparing the single enoyi 
reductase (ER) domain in the rosaramicin PKS enzyme complex to a prototypical ER 
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domain of the erythromycin PKS, i.e. 6-deoxyerythronolide B synthase (DEBS), key 
residues are highlighted. 

Figure 12 represents a Clustal amino acid alignment of the 7 KR domains in the 
rosaramidn PKS enzyme complex. Key residues are highlighted. 

Figure 1 3 represents a Clustal amino acid alignment of the 8 ACP domains in 
the rosaramicin PKS enzyme complex. The l<ey active site serine residue is 
highlighted. 

Figure 14 represents a Clustal amino acid alignment comparing the single 
thioesterase (Te) domain in the rosaramicin PKS enzyme complex to a prototypical Te 
10 domain of the erythromycin PKS, DEBS. 

Figure 15 represents a Clustal amino acid alignment that demonstrates the 
overali high degree of homology between the second AT domain of 0RF7 with two 
other ethylmalonyl-CoA-specific AT domains from the tylosin 'and niddamycin PKS 
complexes. 

Figure 16 is a LGMS graph showing the production of a compound of the 
molecular weight of rosaramicin. 

DETAILED DESCRIPTION OF THE INVENTION: 

Throughout the description and the figures, the biosynthetic locus for 
20 rosaramicin from Micromonospora carbonacea is sometimes refen-ed to as ROSA. 
The ORFs in ROSA are assigned a putative function sometimes referred to throughout 
the description and figures by reference to a four-letter designation, as indicated in 
Table I. 

Table 1 



Families 


ORF# 


Function 


ABCC 


1 


ABC transporter; contains repeated domain 


DATF 


17 


dehydratase/aminotransferase; SMAT family (secondary metabolism 

aminotransferase); transaminase 


GTFA 


11 


glycosyl transferase 


MTFA 


12 


methyltransferase, SAM-dependent; N.N-dimethyltransferases 


MTRA 


19 


resistance methyltransferase; 23S ribosomai 


NBPA 


16 


unknown, nucleotide (ATP/GTP) binding protein; may be involved in regulated 
proteolysis 


OXRB 


10 


oxidoreductase; similar to NDP-hexose-3,4*isomerases (tautomerase) 
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OXRC 


3,4 


oxidoreductase; cytP450 monooxygenase, hydroxylase; oxygen-binding site motif: 
LLXMox^Ujt^, neme-DinQing pocKei mOuT. V3XijxnxoxoxxLxr\, ine cysteine is 
invariable and coordinates the heme 


UAKn 


1o 


oxidoreductase, NAD(P)-dependent; similar to crotonyl CoA reductases (CCR); 
similarity to some quinone oxidoreductases, zinc-containing alcohol 

dehvdrooGnasss 


PKSH 


5-9 


polyketlde synthase, type 1 


REGM 


15 


regulator; similar to TylR global activator of the tylosin locus and the carbomycin 
AcyB2 positive regulator 




|4 


regulator, may be positive regulator; similar to spiramycin SrmR, which specifically 
activates the production of spiramycin 


SURA 


18 


sugar reductase; iron-sulfur (4Fe-4S) protein; may be involved in 1 ,2-migration of 
the amino group firom C4 to C3 via the SchifTs base intennediate ' 


TESA 


2 


thloesterase 



The terms "nnacrolide producer" and "nnacrolide-producing organisnn" refer to a 
microorganism that carries the genetic information necessary to produce a macrolide 
compound, whether or not the organism is known to produce a macrolide compound. 
The terms "rosaramicin producer" and "rosaramicin-producing organism" refer to a 
microorganism that cames the genetic information necessary to produce a rosaramicin 
compound, whether or not the organism is known to produce a rosaramicin product. 
The terms apply equally to organisms in which the genetic information to produce the 
macrolide or rosaramicin compound is found in the organism as it exists in its natural 

10 environment, and to organisms In which the genetic information is introduced by 
recombinant techniques. For the sake of particularity, specific organisms 
qontemplated herein include organisms of the family Micromonosporaceae, of which 
preferred genera Include Micromonospora, Actinoplanes and Dactylosporangium\ the 
family Streptomycetaceae, of which preferred genera include Streptomyces and 
Kitasatospora] the family Pseudonocardiaceae, of which preferred genera are 
^myco/afops/s and Saccharopolyspora; and the family /Icffnosynnemafaceae, of which 
preferred genera include Saccharothrix and Actinosynnema] however the terms are 
intended to encompass all organisms containing genetic information necessary to 
produce a macrolide compound. 

20 The term rosaramicin biosynthetic gene product refers to any enzyme or 

polypeptide Involved in the biosynthesis of rosaramicin. The term "rosaramicin" is 
intended to encompass the compounds sometimes referred to as 4'-deoxycirramycin 
A1, rosamicin. izenamicin A1, juvenimicin A3, 6108A3, M 4365A2, Sch 14947, 
antibiotic 6108A3, antibiotic M 4365A2 and antibiotic Sch 14947. For the sake of 



i 
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particularity, the rosaramicin biosynthetic pathway is associated with Micmmonospora 
carbonacea. However, it should be understood that this term encompasses 
rosaramicin biosynthetic enzymes (and genes encoding such enzymes) Isolated from 
any microorganism of the genus Micromonosppra or Streptomyces, and furthermore 
that these genes may have novel homologues in related actlnomycete microorganisms 
or non-actinomycete microorganisms that fall within the scope of the invention. 
Representative rosaramicin biosynthetic gene products include the polypeptides listed 
in SEQ ID NOS: 2. 4, 6, 8, 10. 12, 14. 16. 18, 20. 22. 24, 26, 28. 30. 32. 34. 36, 38 or 
homologues thereof. 

10 The temfi "isolated" means that the material is removed from its original 

environment, e.g. the natural environment If it is naturally-occurring. For example, a 
naturally-occurring polynucleotide or polypeptide present in a living organism is not 
isolated, but the same polynucleotide or polypeptide, separated from some or all of the 
coexisting materials in Uie natural system, is isolated. Such polynucleotides could be 
part of a vector and/or such polynucleotides or polypeptides could be part of a 
composition, and still be isolated in that such vector or composition is not part of its 
natural environment. 

The temi "purified" does not require absolute purity; rather, it is intended as a 
relative definition. Individual nucleic acids obtained from a library have been 
, 20 conventionally purified to electrophoretic homogeneity. The purified nucleic acids of 
the present invention have been purified from the remainder of the genomic DNA in the 
organism by at least 1 o"^ to 1 0^ fold. However, tiie term "purified" also includes nucleic 
acids wiiich have been purified from the remainder of tiie genomic DNA or from other 
sequences in a library or other environment by at least one order of magnitude, 
preferably two or three orders of magnitude, and more preferably four or five orders of 
magnitude. 

"Recombinanr means that the nucleic add is adjacent to "backbone" nucleic 
acid to which It Is not adjacent in Its natural environment. "Enriched" nucleic adds 
represent 5% or more of the number of nudeic add inserts In a population of nudeic 
30 add backbone molecules. "Backbone" molecules indude nudeic acids such as 
expression vectors, self-replicating nudeic adds, viruses, integrating nucleic adds, 
and other vectors or nudeic adds used to maintain or manipulate a nudeic acid of 
Interest. Preferably, the enriched nucleic adds represent 15% or more, more 
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preferably 50% or more, and most preferably 90% or more, of the number of nucleic 
acid Inserts in the population of recombinant backbone molecules. 

"Recombinanf polypeptides or proteins refer to polypeptides or proteins 
produced by recombinant DNA techniques, i.e. produced from cells transfomned by an 
exogenous DNA constmct encoding the desired polypeptide or protein. "Synthetic" 
polypeptides or proteins are those prepared by chemical synthesis. 

The tenri "gene" means the segment of DNA involved In producing a 
polypeptide chain; it Includes regions preceding and following the coding region (leader 
and trailer).as well as, where applicable, Intervening regions (introns) between 
10 individual coding segments (exons). 

A DNA or nucleotide "coding sequence" or "sequence encoding" a particular 
polypeptide or protein, is a DNA sequence' which Is transcribed and translated into a 
polypeptide or protein when placed under the control of appropriate regulatory 
sequences. 

"Oligonucleotide" refers to a nucleic acid, generally of at least 10, preferably 15 
and more preferably at least 20 nucleotides, preferably no more than 100 nucleotides, 
that are hybridizable to a genomic DNA molecule, a cDNA molecule, or an mRNA 
molecule encoding a gene, mRNA, cDNA or other nucleic acid of interest. 

A promoter sequence is "operably linked to" a coding sequence recognized by 
20 RNA polymerase which initiates transcription at the promoter and transcribes the 
coding sequence Into mRNA. 

"Plasmids" are designated herein by a lower case p preceded or followed by 
capital letters and/or numbers. The starting plasmids herein are commercially 
available, publicly available on an unrestricted basis, or can be constructed from 
available plasmids in accord with published procedures. In addition, equivalent 
plasmids to those described herein are known in the art and will be apparent to the 
skilled artisan. 

"Digestion" of DNA refers to enzymatic cleavage of the DNA with a restriction 
enzyme that acts only at certain sequences In the DNA. The various restriction 
30 enzymes used herein are commercially available and their reaction conditions, 

cofactois and other requirements were used as would be known to the ordinary skilled 
artisan. For analytical purposes, typically 1 yg of plasmid or DNA fragment is used 
with about 2 units of enzyme In about 20 pi of buffer solution. For the purpose of 
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isolating DNA fragments for plasmid construction, typically 5 to 50 \iq of DNA are 
digested with 20 to 250 units of enzyme in a larger volume. Appropriate buffers and 
substrate amounts for particular enzymes are specified by the manufacturer. 
Incubation times of about 1 hour at 37°C are ordinarily used, but may vary in 
accordance with the supplier's instructions. After digestion the gel electrophoresis may 
be perfomied to Isolate the desired fragment. 

We have now discovered the genes and proteins involved in the biosynthesis of 
the16-member macrollde rosaramlcln. Nucleic acid sequences encoding proteins 
involved in the biosynthesis of rosaramlcln are provided in the accompanying 
10 sequence listing as SEQ ID NOS: 3, 5, 7, 9. 11, 13, 15. 17, 19, 21, 23, 25, 27. 29, 31, 
33, 35, 37, 39. Polypeptides involved In the biosynthesis of rosaramlcln are provided 
in the accompanying sequence listing as SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14. 16, 18. 
20, 22, 24, 26, 28, 30, 32, 34. 36. 38. 

One aspect of the present Invention Is an isolated, purified, or enriched nucleic 
acid comprising one of the sequences of SEQ ID NOS: 3, 5, 7, 9, 11 . 13, 15, 17, 19, 
21 , 23, 25, 27. 29, 31 , 33, 35, 37, 39. the sequences complementary thereto, or a 
fragment comprising at least 100, 200, 300. 400, 500. 600, 700, 800 or more 
consecutive bases of one of the sequences of SEQ ID NOS: 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 
19, 21. 23. 25, 27, 29. 31. 33. 35. 37, 39 or the sequences complementary thereto. 
20 The isolated, purified or enriched nucleic acids may comprise DNA. including cDNA. 
genomic DNA; and synthetic DNA. The DNA may be double stranded or single 
stranded, and if single stranded may be the coding (sense) or non-coding (anti-sense) 
strand. /Mtematively, the isolated, purified or enriched nucleic acids may comprise 
RNA. 

As discussed in more detail below, the isolated, purified or enriched nucleic 
acids of one of SEQ ID NOS: 3, 5. 7, 9, 11. 13. 15. 17, 19, 21, 23, 25, 27, 29, 31. 33. 
35. 37. 39 may be used to prepare one of the polypeptides of SEQ ID NOS: 2. 4. 6, 8. 
10. 12. 14. 16. 18, 20. 22, 24. 26. 28. 30, 32, 34, 36, 38 respectively or fragments 
comprising at least 50, 75, 100, 200, 300, 500 or more consecutive amino acids of one 
30 of the polypeptides of SEQ ID NO: 2, 4, 6, 8. 10, 1 2, 14. 1 6, 1 8. 20, 22. 24. 26. 28. 30. 
32. 34, 36, 38. 

Accordingly, anotiier aspect of the present invention is an Isolated, purified or 
enriched nudeic add which encodes one of flie polypeptides of SEQ ID NOS: 2, 4, 6. 
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8, 10, 12. 14. 16, 18. 20, 22, 24, 26. 28, 30, 32. 34. 36, 38 or fragments comprising at 
least 50, 75, 100, 150, 200. 300 or more consecutive amino acids of one of the 
polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16. 18, 20, 22, 24, 26, 28, 30, 32, 
34, 36, 38. The coding sequences of these nucleic acids may be identical to one of 
the coding sequences of one of the nucleic acids of SEQ ID NOS: 3, 5. 7, 9, 1 1 , 13. 
15, 17, 19, 21 , 23, 25, 27. 29. 31 , 33, 35, 37, 39 or a fragment thereof or may be 
different coding sequences which encode one of the polypeptides of SEQ ID NOS: 2, 
4, 6, 8, 10, 12, 14, 16, 18, 20. 22. 24. 26. 28, 30, 32, 34, 36. 38 orfragments 
comprising at least 50, 75, 100, 150, 200, 300 consecutive amino acids of one of the 

10 polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20. 22, 24, 26, 28, 30. 32, 
34, 36, 38 as a result of the redundancy or degeneracy of the genetic code. The 
genetic code Is well known to those of skill in the art and can be obtained, for example, 
from Stryer. Biochemistry, 3"* edition, W. H. Freeman & Co.. New York. 

The isolated, purified or enriched nucleic acid which encodes one of the 
polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14. 16, 18. 20, 22. 24, 26, 28, 30, 32, 
34. 36. 38, may include, but is not limited to: (1 ) only the coding sequences of one of 
SEQ ID NOS: 3, 5, 7, 9, 11. 13. 15. 17. 19. 21, 23, 25, 27, 29, 31, 33, 35, 37, 39; (2) 
the coding sequences of SEQ ID NOS: 3, 5. 7, 9, 11, 13, 15, 17, 19. 21, 23, 25, 27. 29, 
31 . 33, 35. 37. 39 and additional coding sequences, such as leader sequences or 

20 proprotein; and (3) the coding sequences of SEQ ID NOS: 3, 5, 7, 9, 1 1 , 1 3, 1 5. 1 7, 
19, 21. 23, 25. 27. 29, 31, 33, 35, 37. 39 and non-coding sequences, such as introns 
or non-coding sequences 5' and/or 3' of the coding sequence. Thus, as used herein, 
the tenn "polynucleotide encoding a polypeptide" encompasses a polynucleotide that 
indudes only coding sequence for the polypeptide as well as a polynucleotide that 
Includes additional coding and/or non-coding sequence. 

The invention relates to polynucleotides based on SEQ ID NOS: 3, 5, 7, 9. 1 1 , 
13, 15, 17. 19. 21, 23. 25, 27, 29, 31. 33. 35. 37. 39 but having polynucleotide changes 
that are "silent", for example changes which do not alter the amino add sequence 
encoded by the polynudeotldes of SEQ ID NOS: 3, 5, 7, 9, 11. 13. 15. 17. 19, 21, 23, 

30 25. 27, 29, 31 , 33, 35, 37, 39. The Invention also relates to polynucleotides which 
have nudeotide changes which result In amino acid substitutions, additions, deletions, 
fusions and truncations of the polypeptides of SEQ ID NOS: 2. 4, 6, 8, 10, 12, 14. 16. 
18, 20, 22. 24, 26, 28, 30, 32. 34, 36, 38. Such nucleotide changes may be introduced 
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using techniques such as site directed mutagenesis, random chemical mutagenesis, 
exonuclease III deletion, and other recombinant DNA techniques. 

The isolated, purified or enriched nucleic acids of SEQ ID NOS: 3, 5, 7, 9. 1 1 , 
13, 15, 17. 19. 21, 23. 25. 27, 29. 31. 33, 35. 37. 39, the sequences complementary 
thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150. 
200, 300, 400 or 500 consecutive bases of one of the sequence of SEQ ID NOS: 3, 5, 
7, 9, 11, 13. 15. 17. 19, 21. 23. 25. 27, 29. 31. 33, 35. 37, 39, or the sequences 
complementary thereto may be used as probes to identify and isolate DNAs encoding 
the polypeptides of SEQ ID NOS: 2, 4. 6, 8. 10. 12, 14, 16, 18, 20. 22. 24, 26, 28, 30, 

10 32. 34, 36, 38 respectively. In such procedures, a genomic DNA library is constructed 
from a sample microorganism or a sample containing a microorganism capable of 
producing a macrolide. The genomic DNA library is then contacted with a probe 
comprising a coding sequence or a fragment of the coding sequence, encoding one of 
the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10. 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 
32, 34, 36, 38, or a fragment thereof under conditions which permit the probe to 
specifically hybridize to sequences complementary thereto. In a prefenred 
embodiment, the probe is an oligonucleotide of about 10 to about 30 nucleotides in 
length designed based on a nucleic acid of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 
21 , 23, 25, 27, 29, 31 , 33. 35, 37, 39. Genomic DNA clones which hybridize to the 

20 probe are then detected and isolated. Procedures for preparing and identifying DNA 
clones of interest are disclosed in Ausubel etaL, Current Protocols in Molecular 
Biology, John Wiley 503 Sons, Inc. 1997; and Sambroolc et ai, Molecular Cloning: A 
Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory Press, 1989. In another 
embodiment, the probe is a restriction fragment or a PCR amplified nucleic acid 
derived from SEQ ID NOS: 3, 5, 7. 9. 11. 13. 15. 17, 19, 21. 23. 25. 27, 29. 31, 33. 35. 
37, 39. 

The isolated, purified or enriched nucleic acids of SEQ ID NOS: 3. 5. 7. 9, 1 1 , 
13. 15, 17, 19. 21. 23, 25. 27. 29, 31, 33. 35. 37. 39. the sequences complementary 
thereto, or a fragment comprising at least 1 0, 1 5. 20, 25, 30. 35. 40, 50. 75, 1 00. 1 50, 
30 200, 300. 400 or 500 consecutive bases of one of the sequences of SEQ ID NOS: 3. 5, 
7. 9. 11. 13, 15. 17. 19. 21. 23, 25. 27, 29. 31, 33, 35, 37, 39, or the sequences 
complementary thereto may be used as probes to Identify and isolate related nucleic 
acids. In some embodiments, the related nucleic acids may be genomic DNAs (or 
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cDNAs) from potential macrolide producers. In such procedures, a nucleic acid 
sample containing nucleic acids from a potential macrolide-producer or rosaramicin- 
producer is contacted witii the probe under conditions that penmit the probe to 
specifically hybridize to related sequences. The nucleic add sample may be a 
genomic DNA (or cDNA) library from the potential macrolide-producer. Hybridization of 
the probe to nucleic adds is then detected using any of the methods described above. 

Hybridization may be canied out under conditions of low stringency, moderate 
stringency or high stringency. As an example of nucleic acid hybridization, a polymer 
membrane containing immobilized denatured nucleic acids is first prehybridized for 30 

10 minutes at 45 "C in a solution consisting of 0.9 M NaCI, 50 mM NaHaPOn. pH 7,0, 5.0 
mM NaaEDTA, 0.5% SDS, 10X Denhardt's, and 0.5 mg/ml polyriboadenylic add. 
Approximately 2x10^ cpm (specific adivity 4-9 x 10® cpm/ug) of end-labeled 
oligonudeotide probe are then added to the solution. After 12-16 hours of incubation, 
the membrane is washed for 30 minutes at room temperature in 1X SET (150 mM 
NaCI, 20 mM Tris hydrochloride, pH 7.8, 1 mM NaaEDTA) containing 0.5% SDS, 
followed by a 30 minute wash in fresh 1X SET at Tm-10°C for the oligonudeotide 
probe where Tm is the melting temperature. The membrane is then exposed to 
autoradiographic film for detection of hybridization signals. 

By varying the stringency of the hybridization conditions used to identify nucleic 

20 adds, such as genomic DNAs or cDNAs, wjiich hybridize to the detectable probe, 
nudeic acids having different levels of homology to the probe can be identified and 
isolated. Stringency may be varied by conducting the hybridization at varying 
temperatures below the melting temperatures of the probes. The melting temperature 
of the probe may be calculated using the following fomnulas: 

For oligonucleotide probes between 14 and 70 nudeotides In length the melting 
temperature (Tm) In degrees Celdus may be calculated using the fonnula: 
Tm=81.5+16.6(log [Na+]) + 0.41 (fraction G+C)-(600/N) where N is the length of the 
oligonudeotide. 

If the hybridization is carried out In a solution containing fonnamide, the melting 
30 temperature may be calculated using the equation Tm=81 .5+1 6.6(log [Na +]) + 
0.41 (fraction G + CHO.63% fomnamide)-(600/N) where N Is the length of the probe. 

Prehybridization may be earned out In 6X SSC, 5X Denhardt's reagent, 0.5% 
SDS, 0.1 mg/ml denatured fragmented salmon spemn DNA or 6X SSC, 5X Denhardt's 
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reagent, 0,5% SDS. 0.1 mg/ml denatured fragmented salmon sperm DNA, 50% 
formamide. The composition of the SSC and Denhardt's solutions are listed in 
Sambrook et aL, supra. 

Hybridization is conducted by adding the detectable probe to the hybridization 
solutions listed above. Where the probe comprises double stranded DNA. it is 
denatured by Incubating at elevated temperatures and quickly cooling before addition 
to the hybridization solution. It may also be desirable to similariy denature single 
stranded probes to eliminate or diminish formation of secondary structures or 
oligomerization. The filter is contacted with the hybridization solution for a sufficient 
10 period of time to allow the probe to hybridize to cDNAs or genomic DMAs containing 
sequences complementary thereto or homologous thereto. For probes over 200 
nucleotides in length, the hybridization may be earned out at 15-25 ^'C below the Tm. 
For shorter probes, such as pligonucleotide probes, the hybridization may be 
conducted at 5-10 below the Tm. Preferably, the hybridization is conducted in 6X 
SSC, for shorter probes. Preferably, the hybridization is conducted in 50% formamide 
containing solutions, for longer probes. 

All the foregoing hybridizations would be considered to be examples of 
hybridization performed under conditions of high stringency. 

Following hybridization, the filter is washed for at least 15 minutes in 2X 
20 SSC, 0.1% SDS at room temperature or higher, depending on the desired stringency. 
The filter is then washed with 0.1X SSC, 0.5% SDS at room temperature (again) for 30 
minutes to 1 hour. 

Nucleic acids which have hybridized to the probe are identified by 
conventional autoradiography and non-radioactive detection methods. 

The above procedure may be modified to identify nucleic acids having 
decreasing levels of homology to the probe sequence. For example, to obtain nucleic 
acids of decreasing homology to the detectable probe, less stringent conditions may 
be used. For example, the hybridization temperature may be decreased in increments 
of 5 ^C from 68 **C to 42 *C in a hybridization buffer having a Na+ concentration of 
30 . approximately 1M. Following hybridization, the filter may be washed with 2X SSC. 
0.5% SDS at the temperature of hybridization. These conditions are considered to be 
"moderate stringency" conditions above SO^'C and "low stringency" conditions below 
50°C, A specific example of "moderate stringency" hybridization conditions is when 
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the above hybridization is conducted at SS^C. A specific example of "low stringency" 
iiybridization conditions is when the above hybridization is conducted at 45°C. 

Alternatively, the hybridization may be earned out in buffers, such as 6X 
SSC, containing fbmriamide at a temperature of 42 °C. in this case, the concentration 
of fomiamide in the hybridization buffer may be reduced in 5% increments from 50% to 
0% to identify clones having decreasing levels of homology to the probe. Following 
hybridization, the filter may be washed with 6X SSC, 0.5% SDS at 50 °C. These 
conditions are considered to be "moderate stringency" conditions above 25% 
formamide and "low stringency" conditions below 25% fomamide. A specific example 

10 of "moderate stringency" hybridization conditions is when the above hybridization is 
conducted at 30% fonnamide. A specific example of "low stringency" hybridization 
conditions is when the above hybridization is conducted at 10% formamide. 

Nucleic acids which have hybridized to the probe are identified by 
conventional autoradiography and non-radioactive detection methods. 

For example, the preceding methods may be used to isolate nucleic acids 
having a sequence with at least 97%, at least 95%, at least 90%, at least 85%, at least 
80%, or at least 70% homology to a nucleic acid sequence selected from the group 
consisting of the sequences of SEQ ID NOS: 3, 5, 7, 9. 1 1. 13. 15, 17, 19, 21 , 23, 25, 
27, 29. 31, 33, 35, 37. 39, fragments comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 

20 75, 1 00, 1 50, 200, 300, 400. or 500 consecutive bases thereof, and the sequences 
complementary thereto. Homology may be measured using BLASTN version 2.0 with 
the default parameters. For example, the homologous polynucleotides may have a 
coding sequence that is a naturally occurring allelic variant of one of the coding 
sequences described herein. Such allelic variant may have a substitution, deletion or 
addition of one or more nucleotides when compared to the nucleic acids of SEQ ID 
NOS: 3, 5, 7. 9, 11. 13, 15. 17, 19. 21, 23, 25, 27, 29, 31, 33, 35, 37, 39. or the 
sequences complementary thereto. 

Additionally, the above procedures may be used to isolate nucleic acids 
which encode polypeptides having at least 99%, 95%, at least 90%, at least 85%, at 

30 least 80%, or at least 70% homology to a polypeptide having the sequence of one of 
SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38. or 
fragments comprising at least 50, 75, 100, 150, 200, 300 consecutive amino acids 
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thereof as determined using the Bi_ASTP version 2.2.2 algorithm with default 
parameters. 

Another aspect of the present invention is an isolated or purified polypeptide 
comprising the sequence of one of SEQ ID NOS: 2, 4. 6, 8. 10. 12. 14, 16. 18, 20, 22, 
24, 26. 28, 30, 32. 34, 36. 38 or fragments comprising at least 50, 75, 1 00. 1 50. 200 or 
300 consecutive amino acids thereof. As discussed herein, such polypeptides may be 
obtained by inserting a nucleic acid encoding the polypeptide into a vector such that 
the coding sequence is operably linked to a sequence capable of driving the 
expression of the encoded polypeptide in a suitable host cell. For example, the 
10 expression vector may comprise a promoter, a ribosome binding site for translation 
initiation and a transcription terminator. The. vector may also include appropriate 
sequences for modulating expression levels, an origin of replication and a selectable 
marker. 

Promoters suitable for expressing the polypeptide or fragment thereof in 
bacteria include the E.co// lac or trp promoters, the lad promoter, the lacZ promoter, 
the T3 promoter, the T7 promoter, the gpt promoter, the lambda Pr promoter, the 
lambda Pl promoter, promoters from operons encoding glycolytic enzymes such as 3- 
phosphoglycerate kinase (PGK), and the acid phosphatase promoter. Fungal 
promoters include the a factor promoter. Eukaryotic promoters include the CMV 
20 immediate eariy promoter, the HSV thymidine kinase promoter, heat shock promoters, 
the eariy and late SV40 promoter, LTRs from retroviruses, and the mouse 
metallothionein-l promoter. Other promoters known to control expression of genes in 
prokaryotic or eukaryotic cells or their viruses may also be used. 

Mammalian expression vectors may also comprise an origin of replication, 
any necessary ribosome binding sites, a polyadenylation site, splice donors and 
acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed 
sequences. In some embodiments, DNA sequences derived from the SV40 splice and 
polyadenylation sites nnay be used to provide the required nontranscribed genetic 
elements. 

30 Vectors for expressing the polypeptide or fragment thereof in eukaryotic cells 

may also contain enhancers to increase expression levels. Enhancers are cis-acting 
elements of DNA, usually from about 10 to about 300 bp in length that act on a 
promoter to increase its transcription. Examples include the SV40 enhancer on the 
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late side of the replication origin bp 100 to 270. tlie cytomegalovirus early promoter 
enhancer, the polyoma enhancer on the late side of the replication origin, and the 
adenovirus enhancers. 

In addition, the expression vectors preferably contain one or more selectable 
marker genes to permit selection of host cells containing the vector. Examples of 
selectable markers that may be used include genes encoding dihydrofolate reductase 
or genes conferring neomycin resistance for eukaryotic cell culture, genes conferring 
tetracycline or ampicillln resistance in E. co//, and the S. cerevisiae TRP1 gene. 

In some embodiments, the nucleic acid encoding one of the polypeptides of 

10 SEQ ID NOS: 2, 4. 6, 8, 10, 12, 14. 16, 18, 20, 22, 24, 26, 28. 30, 32, 34* 36. 38, or 
fragments comprising at least 50, 75, 100, 150, 200 or 300 consecutive amino acids 
thereof is assembled in appropriate phase with a leader sequence capable of directing 
secretion of the translated polypeptides or fragments thereof. Optionally, the nucleic 
acid can encode a fusion polypeptide in which one of the polypeptide of SEQ ID NOS: 
2. 4, 6. 8, 10. 12, 14, 16, 18. 20, 22. 24, 26, 28, 30. 32, 34, 36, 38 or fragments 
comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75. 100, or 150 consecutive amino 
acids thereof is fused to heterologous peptides or polypeptides, such as N-terminal 
identification peptides which impart desired characteristics such as increased stability 
or simplified purification or detection. 

20 The appropriate DNA sequence may be inserted into the vector by a variety 

of procedures. In general, the DNA sequence is ligated to the desired position in the 
vector following digestion of the insert and the vector with appropriate restriction 
endonucleases. Alternatively, appropriate restriction enzyme sites can be engineered 
into a DNA sequence by PGR. A variety of cloning techniques are disclosed in Ausbel 
et al. Cunrent Protocols in Molecular Biology, John Wiley 503 Sons, Inc. 1997 and 
Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbour 
Laboratory Press, 1989. Such procedures and others are deemed to be within the 
scope of those skilled in the art. 

The vector may be, for example, in the form of a plasmid, a viral particle, or a 

30 phage. Other vectors include derivatives of chromosomal, nonchromosomal and 

synthetic DNA sequences, viruses, bacterial plasmids, phage DNA, baculovirus, yeast 
plasmids, vectors derived from combinations of plasmids and phage DNA, viral DNA 
such as vaccinia, adenovinas, fowl pox virus, and pseudorabies. A variety of cloning 
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and expression vectors for use with prokaryotic and eukaryotic hosts are described by 
Sambrook, et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold 
Spring Harbor, N. Y.. (1 989). 

Particular bacterial vectors which may be used include the commercially 
available plasmids comprising genetio elements of the well known cloning vector 
pBR322 (ATCC 37017), pKK223-3 (Phamiacia Fine Chemicals, Uppsala, Sweden), 
pGEMI (Promega Biotec. Madison. Wl, USA) pQE70. pQE60. pQE-9 (QIagen), pDIO. 
phiX174, pBluescript II KS. pNHSA, pNH16a, pNH18A, pNH46A (Stratagene), ptrc99a, 
PKK223-3, PKK233-3. pDR540, pRIT5 (Pharmacia), pKK232-8 and pCM7. Particular 

10 eukaryotic vectors include pSV2CAT, pOG44, pXTI . pSG (Stratagene) pSVK3, pBPV, 
pMSG, and pSVL (Pharmacia). However, any other vector may be used as long as it 
is repllcable and stable in the host cell. 

The host cell may be any of the host cells familiar to those skilled in the art, 
including prokaryotic cells or eukaryotic cells. As representative examples of 
appropriate hosts, there may be mentioned: bacteria cells, such as £. coli, 
Streptomyces, Bacillus subtilis. Salmonella typhimurium and various species within the 
genera Pseudomonas, Streptomyces, and Staphylococcus, fungal cells, such as yeast, 
insect cells such as Drosophila S2 and Spodoptera Sf9, animal cells such as CHO, 
COS or Bowes melanoma, and adenoviruses. The selection of an appropriate host is 

20 within the abilities of those skilled in the art. 

The vector may be introduced into the host cells using any of a variety of 
techniques, including electroporation transfomnation, transfection, transduction, viral 
infection, gene guns, or Ti-mediated gene transfer. Where appropriate, the engineered 
host cells can be cultured in conventional nutrient media modified as appropriate for 
activating promoters, selecting transformants or amplifying the genes of the present 
invention. Following transformation of a suitable host strain and growth of the host 
strain to an appropriate cell density, the selected promoter may be induced by 
appropriate means (e.g., temperature shift or chemical induction) and the cells may be 
cultured for an additional period to.allow them to produce the desired polypeptide or 

30 fragment thereof. 

Cells are typically harvested by centrifugation, disrupted by physical or * 
chemical means, and the resulting cmde extract is retained for further purification. 
Microbial cells employed for expression of proteins can be disrupted by any convenient 



wo 03/010193 



-20- 



PCT/CA02/01177 



method, including freeze-thaw cycling, sonication, meclnanical disruption, or use of cell 
lysing agents. Such methods are well known to those skilled in the art. The expressed 
polypeptide or fragment thereof can be recovered and purified from recombinant cell 
cultures by methods including ammonium sulfate or ethanol precipitation, acid 
extraction, anion or cation exchange chromatography, phosphocellulose 
chromatography, hydrophobic interaction chromatography, affinity chromatography, 
hydroxylapatite chromatography and lectin chromatography. Protein refolding steps 
can be used, as necessary, in completing configuration of the polypeptide. If desired, 
high performance liquid chromatography (HPLC) can be employed for final purification 
10 steps. 

Various mammalian cell culture systems can also be employed to express 
recombinant protein. Examples of mammalian expression systems include the COS-7 
lines of monkey kidney fibroblasts (described by Gluzman, Cell, 23:175(1981 )). and 
other cell lines capable of expressing proteins from a compatible vector, such as the 
C127, 3T3, CHO, HeLa and BHK cell lines. 

The constructs in host cells can be used in a conventional manner to 
produce the gene product encoded by the recombinant sequence. Depending upon 
the host employed in a recombinant production procedure, the polypeptide produced 
by host cells containing the vector may be glycosylated or may be non-glycosylated. 
20 Polypeptides of the invention may or may not also include an initial methionine amino 
acid residue. 

Alternatively, the polypeptides of SEQ ID NOS: 2, 4, 6. 8, 10. 12, 14, 16, 18, 
20, 22,^24, 26, 28, 30, 32, 34, 36, 38. or fragments comprising at least 50, 75, 100, 
150, 200 or 300 consecutive amino acids thereof can be synthetically produced by 
conventional peptide synthesizers. In other embodiments, fragments or portions of the 
polynucleotides may be employed for producing the corresponding full-length 
polypeptide by peptide synthesis; therefore, the fragments may be employed as 
intennediates for producing the full-length polypeptides. 

Cell-free translation systems can alSo be employed to produce one of the 
30 polypeptides of SEQ ID NOS: 2, 4, 6. 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30. 32, 
34, 36, 38, or fragments comprising at least 50, 75, 100, 150, 200 or 300 consecutive 
amino acids thereof using mRNAs transcribed from a DNA construct comprising a 
promoter operably linked to a nucleic acid encoding the polypeptide or fragment 
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thereof. In some embodiments, the DNA constaict may be linearized prior to 
conducting an in vitro transcription reaction. The transcribed mRNA is then incubated 
with an appropriate cell-free translation extract, such as a rabbit reticulocyte extract, to 
produce the desired polypeptide or fragment thereof. 

The present Invention also relates to variants of the polypeptides of SEQ ID 
NOS: 2, 4. 6, 8, 10, 12, 14. 16. 18, 20, 22, 24, 26, 28, 30. 32. 34. 36, 38, or fragments 
comprising at least 50. 75, 100, 150, 200 or 300 consecutive amino acids thereof. The 
term "varianr includes derivatives or analogs of these polypeptides. In particular, the 
variants may differ In amino add sequence from the polypeptides of SEQ ID NOS: 2, 
10 4, 6, 8. 10, 12. 14, 16, 18, 20, 22, 24, 26, 28, 30, 32. 34, 36, 38. by one or more 
substitutions, additions, deletions, fusions and tmncations, which may be present in 
any combination. 

The variants may be naturally occurring or created in vitro. In particular, 
such variants may be created using genetic engineering techniques such as site 
directed mutagenesis, random chemical mutagenesis, Exonuclease III deletion 
procedures, and standard cloning techniques. Alternatively, such variants, fragments, 
analogs, or derivatives may be created using chemical synthesis or modification 
procedures- 
Other methods of making variants are also familiar to those skilled in the art. 
20 These include procedures in which nucleic acid sequences obtained from natural 
isolates are modified to generate nucleic acids that encode polypeptides having 
characteristics which enhance their value in industrial or laboratory applications. In 
such procedures, a large number of variant sequences having one or more.nucleotide 
differences with respect to the sequence obtained from the natural Isolate are 
generated and characterized. Preferably, these nucleotide differences result in amino 
acid changes with respect to the polypeptides encoded by the nucleic acids from the 
natural isolates. 

For example, variants may be created using en^or prone PGR. In error prone 
PGR, DNA amplification is perfonned under conditions where the fidelity of the DNA 
30 polymerase is low, such that a high rate of point mutation is obtained along the entire 
length of the PGR product. En-or prone PGR is described in Leung, D.W., et al.. 
Technique, 1:1 1-15 (1989) and Galdwell, R. G. & Joyce G.F., PGR Methods Applic, 
2:28-33 (1992). Variants may also be created using site directed mutagenesis to 
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generate site-specific mutations in any cloned DNA segment of interest. 
Oligonucleotide mutagenesis is described in Reidhaar-Olson, J.F. & Sauer, R.T., etaL, 
Science, 241:53-57 (1988). Variants may also be created using directed evolution 
strategies such as those described in US patent nos. 6,361,974 and 6.372.497. The 
variants of the polypeptides of SEQ ID NOS: 2. 4, 6, 8, 10. 12, 14. 16, 18. 20. 22. 24. 
26, 28, 30, 32, 34, 36, 38, maybe (1) variants in which one or more of the amino acid 
residues of the polypeptides of SEQ ID NOS: 2, 4. 6, 8. 10, 12, 14, 16. 18, 20. 22, 24. 
26, 28, 30, 32, 34, 36, 38, gre substituted with a conserved or non-conserved amino 
acid residue (preferably a conserved amino acid residue) and such substituted amino 
10 acid residue may or may not be one encoded by the genetic code- 
Conservative substitutions are those that substitute a given amino acid in a 
polypeptide by another amino acid of like characteristics. Typically seen as 
conservative substitutions are the following replacements: replacements of an aliphatic 
amino acid such as Ala. Val, Leu and lie with another aliphatic amino acid; 
replacement of a Ser with a Thr or vice versa; replacement of an acidic residue such 
as Asp or Glu with another acidic residue; replacement of a residue bearing an amide 
group, such as Asn or Gin, with another residue bearing an amide group; exchange of 
a basic residue such as Lys or Arg with another basic residue; and replacement of an 
aromatic residue such as Phe or Tyr with another aromatic residue. 
20 Other variants are those in which one or more of the amino acid residues of 

the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26. 28. 30, 
32, 34, 36, 38 includes a substituent group. 

Still other variants are those in which the polypeptide is associated with 
another compound, such as a compound to increase the half-life of the polypeptide (for 
example, polyethylene glycol). 

Additional variants are those in which additional amino acids are fused to the 
polypeptide, such as leader sequence, a secretory sequence, a proprotein sequence 
or a sequence which facilitates purification, enrichment, or stabilization of the 
polypeptide. 

30 In some embodiments, the fragments, derivatives and analogs retain the 

same biological function or activity as the polypeptides of SEQ ID NOS: 2, 4, 6, 8. 10, 
12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38. In other embodiments, the 
fragment, derivative or analogue includes a fused heterologous sequence which 
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facilitates purification, enrichment, detection, stabilization or secretion of the 
polypeptide that can be enzymatically cleaved, In whole or in part, away from the 
fragment, derivative or analogue. 

Anotiner aspect of the present invention are polypeptides or fragments 
tiiereof which have at least 70%, at least 80%, at least 85%, at least 90%, or more 
than 95% homology to one of the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 
16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, or a fragment comprising at least 50. 75, 
100, 150, 200 or 300 consecutive amino acids thereof. Homology may be determined 
using a program, such as BLASTP version 2.2.2 witii the default parameters, which 
10 aligns the polypeptides or fragments being compared and determines the extent of 
amino acid identity or similarity between them. It will be appreciated that amino acid 
"homology" includes conservative substitutions such as those described above. 

The polypeptides or fragments having homology to one of the polypeptides 
of SEQ ID NOS: 2, 4, 6. 8, 10, 12. 14, 16, 18, 20, 22, 24. 26, 28. 30. 32, 34. 36, 38. or 
a fragment comprising at least 50, 75, 100, 1 50, 200 or 300 consecutive amino acids 
thereof may be obtained by isolating the nucleic acids encoding them using the 
techniques described above. 

Alternatively, the homologous polypeptides or fragments may be obtained 
through biochemical enrichment or purification procedures. The sequence of 
20 potentially homologous polypeptides or fragments may be determined by proteolytic 
digestion, gel electrophoresis and/or microsequencing. The sequence of the 
prospective homologous polypeptide or fragment can be compared to one of the 
polypeptides of SEQ ID NOS: 2, 4, 6. 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 
34, 36, 38, or a fragment comprising at least 5, 10. 15, 20, 25, 30, 35, 40, 50, 75. 100, 
or 150 consecutive amino acids thereof using a program such as BU\STP version 
2.2.2 with the default parameters. 

The polypeptides of SEQ ID NOS: 2, 4, 6, 8. 10, 12, 14, 16, 18, 20, 22, 24. 
26, 28, 30, 32, 34, 36, 38, or fragments, derivatives or analogs thereof comprising at 
least 40, 50, 75, 100, 150, 200 or 300 consecutive amino adds thereof invention may 
30 be used In a variety of applications. For example, the polypeptides or fragments, 
derivatives or analogs thereof may be used to catalyze certain biochemical reactions. 
In particular, the polypeptides of the TESA family, namely SEQ ID NO: 4 or fragments, 
derivatives or analogs thereof; tiie PKSH family, namely SEQ ID NOS: 10, 12, 14, 16, 
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18 or fragments, derivatives or analogs thereof; the OXRH family, namely SEQ ID NO: 
26 or fragments, derivatives or analogs thereof may be used in any combination, in 
vitro or in vivo, to direct or enhance the synthesis or modification of a polyl<etide, 
polyketide substructure, or precursor thereof. Polypeptides of the MTFA family, 
namely SEQ ID NO: 24 or fragments, derivatives or analogs thereof may be used, in 
vitro or in vivo, to catalyze methylation reactions that modify compounds that are either 
endogenously produced by the host, supplemented to the growth medium, or are 
added to a cell-free, purified or enriched preparation of MTFA polypeptide. 
Polypeptides of the OXRC family, namely SEQ ID NOS: 6, 8 or fragments, derivatives 

10 or analogs thereof; the OXRB family, namely SEQ ID NO: 20 or fragments, derivatives 
or analogs thereof; the OXRH family, namely SEQ ID NO: 26 or fragments, derivatives 
or analogs thereof may be used, in vitro or in vivo, to catalyze oxidation reactions that 
modify compounds that are either endogenously produced by the host, supplemented 
to the growth medium, or are added to a cell-free, purified or enriched preparation of 
said polypeptide. Polypeptides of the NBPA family, namely SEQ ID NO: 32 or 
fragments, derivatives or analogs thereof; the OXRB family, namely SEQ ID NO: 20 or 
fragments, derivatives or analogs thereof; the DATF family, namely SEQ ID NO: 34 or 
fragments, derivatives or analogs thereof; the SURA family, namely SEQ ID NO: 36 or 
fragments, derivatives or analogs thereof; the MTFA family, namely SEQ ID NO: 24 or 

20 fragments, derivatives or analogs thereof; the GTFA family, namely SEQ ID NO: 22 or 
fragments, derivatives or analogs thereof may be used, in vitro or in vivo, to catalyze 
biochemical reactions involved in activating, modifying, or transferring sugar moieties. 
Polypeptides of the ABCC family, namely SEQ ID NO: 2 or fragments, derivatives or 
analogs thereof; the MTRA family, namely SEQ ID NO: 38 or fragments, derivatives or 
analogs thereof may be used to confer to microorganisms or eukaryotic cells 
resistance to polyketides, macrolides, rosaramicin, or compounds related to 
rosaramidn. Polypeptides of the REGS family, namely SEQ ID NO: 28 or fragments, 
derivatives or analogs thereof; the REGM family, namely SEQ ID NO: 30 or fragments, 
derivatives or analogs thereof may be used to increase the yield of polyl<etides. 

30 macrolides, rosaramicin, or compounds related to rosaramicin in either naturally 
producing organisms or heterologously producing recombinant organisms. 

The polypeptides of SEQ ID NOS: 2, 4, 6. 8, 10, 12, 14, 16, 18, 20, 22, 24, 
26, 28, 30, 32, 34, 36, 38, or fragments, derivatives or analogues thereof comprising at 
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least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino adds thereof, 
may also be used to generate antibodies which bind specifically to the polypeptides or 
fragments, derivatives or analogues. The antibodies generated from SEQ ID NOS: 2. 
4. 6, 8, 10, 12, 14, 16, 18, 20, 22. 24, 26, 28, 30. 32. 34. 36. 38 may be used to 
determine whether a biological sample contains Mcro/nonospo/a carbonacea or a 
related microorganism. 

In such procedures, a biological sample is contacted with an antibody • 
capable of specifically binding to one of the polypeptides of SEQ ID NOS: 2. 4, 6, 8, 
10, 12, 14, 16, 18. 20. 22, 24, 26, 28, 30, 32, 34. 36, 38, or fragments comprising at 

10 least 5, 10, 15, 20, 25, 30, 35, 40. 50, 75, 100, or 150 consecutive amino acids thereof. 
The ability of the biological sample to bind to the antibody is then determined. For 
example, binding may be determined by labeling the antibody with a detectable label 
such as a fluorescent agent, an enzymatic label, or a radioisotope. Alternatively, 
binding of the antibody to the sample may be detected using a secondary antibody 
having such a detectable label thereon. A variety of assay protocols which may be 
used to detect the presence of an rosaramicin-producer or of Micromonospora 
carbonacea or of polypeptides related to SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 
20, 22, 24, 26, 28, 30, 32, 34, 36, 38, in a sample are familiar to those skilled in the art. 
Particular assays include ELISA assays, sandwich assays, radioimmunoassays, and 

20 Western Blots. Alternatively, antibodies generated from SEQ ID NOS: 2, 4, 6, 8, 10, 
12, 14, 16. 18, 20, 22, 24, 26, 28. 30, 32, 34, 36, 38, may be used to determine 
whether a biological sample contains related polypeptides that may be involved in the 
biosynthesis of natural products of the rosaramicin class or other macrolides. 

Polyclonal antibodies generated against the polypeptides of SEQ ID NOS: 2, 
4. 6, 8, 10, 12, 14. 16, 18, 20, 22. 24, 26. 28, 30. 32. 34, 36. 38. or fragments 
comprising at least 5, 10, 15, 20, 25, 30, 35. 40, 50, 75, 100. or 150 consecutive amino 
acids thereof can be obtained by direct injection of the polypeptides Into an animal or 
by administering the polypeptides to an animal, preferably a nonhuman. The antibody 
so obtained will then bind the polypeptide itself. In this manner, even a sequence 

30 encoding only a fragment of the polypeptide can be used to generate antibodies that 
may bind to the whole native polypeptide. Such antibodies can then be used to isolate 
the polypeptide from cells expressing that polypeptide. 
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For preparation of monoclonal antibodies, any technique which provides 
antibodies produced by continuous cell line cultures can be used. Examples include- 
the hybridoma technique (Kholer and Milstein. 1975, Nature. 256:495-497), the trioma 
technique, the human B-cell hybridoma technique (Kozbor et al., 1983, Immunology 
Today 4:72), and the EBV-hybridoma technique (Cole. etal.. 1985, in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc.. pp. 77-96). 

Techniques described for the production of single chain antibodies (U.S. 
Patent 4,946.778) can be adapted to produce single chain antibodies to the 
polypeptides of SEQ ID NOS: 2. 4. 6, 8, 10. 12, 14, 16, 18, 20. 22. 24. 26, 28, 30, 32. 
10 34, 36, 38. or fragments comprising at least 5, 1 0, 1 5, 20, 25, 30. 35. 40, 50. 75. 1 00. 
or 150 consecutive amino acids thereof. Alternatively, transgenic mice may be used to 
express humanized antibodies to these polypeptides or fragments thereof. 

Antibodies generated against the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 
10, 12. 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, or fragments comprising at 
least 5. 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof 
may be used in screening for similar polypeptides from a sample containing organisms 
or cell-free extracts thereof. In such techniques, polypeptides from the sample is 
contacted with the antibodies and those polypeptides which specifically bind the 
antibody are detected. Any of the procedures described above may be used to detect 
20 antibody binding. One such screening assay is described in "Methods for measuring 
Cellulase Activities", Methods in Enzymology, Vol 160, pp. 87-1 16. 

As used herein, the temi "nucleic acid codes of SEQ ID NOS: 3, 5, 7, 9, 1 1 , 
13, 15, 17, 19. 21, 23, 25, 27, 29, 31, 33, 35, 37, 39" encompass the nucleotide 
sequences of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17. 19. 21. 23. 25. 27, 29, 31, 33. 35, 
37, 39, fragments of SEQ ID NOS: 3, 5, 7, 9. 11, 13, 15, 17, 19, 21. 23, 25. 27. 29. 31. 
33, 35, 37, 39, nucleotide sequences homologous to SEQ ID NOS: 3, 5, 7, 9. 11, 13, 
15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, or homologous to fragments of SEQ 
ID NOS: 3. 5, 7, 9, 11, 13, 15. 17, 19. 21. 23, 25, 27, 29, 31, 33, 35, 37, 39, and 
sequences complementary to all of the preceding sequences. The fragments include 
30 portions of SEQ ID NOS: 3, 5, 7, 9, 11, 13. 15, 17, 19. 21. 23, 25, 27, 29, 31, 33, 35, 
37, 39. comprising at least 10. 15. 20, 25, 30, 35, 40, 50, 75. 100, 150, 200. 300. 400 
or 500 consecutive nucleotides of SEQ ID NOS: 3, 5, 7, 9. 11, 13, 15, 17, 19, 21, 23, 
25, 27, 29, 31, 33, 35, 37, 39. Preferably, the fragments are novel fragments. 
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Homologous sequences and fragments of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15; 17, 19, 
21, 23, 25, 27. 29, 31, 33, 35, 37, 39 refer to a sequence having at least 99%. 98%. 
97%. 96%, 95%, 90%, 80%, 75% or 70% identity to these sequences. Homology may 
be detemiined using any of the computer programs and parameters described herein, 
' including BLASTN and TBLASTX with the default parameters. Homologous 
sequences also include RNA sequences in which uridines replace the thymines in the 
nucleic acid codes of SEQ ID NOS: 3. 5, 7, 9, 11, 13, 15, 17, 19, 21. 23. 25, 27. 29, 31. 
33, 35, 37. 39. 

The homologous sequences may be obtained using any of the procedures 
10 described herein or may result from the correction of a sequencing en-or. It will be 
appreciated that the nucleic acid codes of SEQ ID NOS: 3, 5, 7. 9. 1 1 , 13. 15. 17. 19, 
21 , 23. 25. 27. 29. 31 . 33, 35, 37, 39 can be represented In the traditional single 
character format in which G, A. T and 0 denote the guanine, adenine, thymine and 
cytosine bases of the deoxyribonucleic acid (DNA) sequence respectively, or in which 
G, A. U and 0 denote the guanine, adenine, uracil and cytosine bases of the 
ribonucleic acid (RNA) sequence (see the Inside back cover of Stryer, Biochemistry, 3"* 
edition. W. H. Freennan & Co., New York) or in any other format which records the 
identity of the nucleotides in a sequence. 

"Polypeptide codes of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22. 24. 
20 26. 28, 30, 32, 34, 36, 38" encompass the polypeptide sequences of SEQ ID NOS: 2, 
4, 6. 8, 10, 12, 14, 16, 18, 20, 22. 24, 26, 28, 30. 32. 34, 36, 38 which are encoded by 
the nucleic acid sequences of SEQ ID NOS: 3, 5, 7. 9. 11, 13, 15, 17. 19. 21. 23. 25. 
27. 29, 31 , 33. 35. 37, 39. polypeptide sequences homologous to the polypeptides of 
SEQ ID NOS: 2. 4, 6, 8, 10. 12. 14, 16, 18, 20, 22. 24. 26, 28, 30, 32, 34, 36. 38. or 
fragments of any of the preceding sequences. Homologous polypeptide sequences 
refer to a polypeptide sequence having at least 99%, 98%, 97%. 96%. 95%. 90%. 
85%, 80%, 75% or 70% identity to one of the polypeptide sequences of SEQ ID NOS: 
2, 4. 6, 8. 10, 12. 14, 16. 18. 20, 22. 24. 26. 28, 30. 32, 34. 36, 38. Polypeptide 
sequence homology may be determined using any of the computer programs and 
30 parameters described herein, including BLASTP version 2.2.1 with the default 

parameters or with any user-specified parameters. The homologous sequences may 
be obtained using any of the procedures described herein or may result from the 
con-ection of a sequencing en-or. The polypeptide fragments comprise at least 5. 10. 
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15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive amino acids of the polypeptides 
of SEQ ID NOS: 2, 4, 6. 8. 10, 12. 14. 16, 18, 20. 22, 24, 26, 28, 30, 32. 34, 36. 38. 
Preferably the fragments are novel fragments. It will be appreciated that the 
polypeptide codes of the SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16. 18, 20, 22. 24, 26, 28, 
30, 32, 34. 36, 38 can be represented in the traditional single character fonnat or three 
letter format (see the inside back cover of Stryer, Biochemistry, 3"* edition, W.H. 
Freeman & Co., New York) or in any other fomiat which relates the identity of the 
polypeptides in a sequence. 

It will be readily appreciated by those skilled in the art that the nucleic acid 
10 codes of SEQ ID NOS: 3, 5, 7, 9, 11, 13. 15. 17, 19. 21. 23, 25. 27, 29. 31. 33. 35. 37, 
39, and the polypeptide codes of SEQ ID NOS: 2. 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 

24. 26. 28. 30. 32, 34, 36, 38 can be stored, recorded and manipulated on any medium 
which can be read and accessed by a computer. As used herein, the words "recorded" 
and "stored" refer to a process for storing infomiation on a computer medium. A skilled 
artisan can readily adopt any of the presently known methods for recording information 
on a computer readable medium to generate manufactures comprising one or more of 
the nucleic acid codes of SEQ ID NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29. 
31 , 33, 35, 37, 39, and the polypeptide codes of SEQ ID NOS: 2, 4, 6, 8. 10. 12, 14, 

1 6, 1 8, 20, 22, 24, 26, 28, 30, 32, 34. 36, 38. 
20 Computer readable media include magnetically readable media, optically 

readable media, electronically readable media and magnetic/optical media. For 
example, the computer readable media may be a hard disk, a floppy disk, a magnetic 
tape, CD-ROM, Digital Versatile Disk (DVD), Random Access Memory (RAM), or Read 
Only Memory (ROM) as well as other types of media known to those skilled in the art. 

The nucleic acid codes of SEQ ID NOS: 3, 5, 7, 9, 11. 13. 15. 17, 19. 21. 23, 

25. 27. 29, 31 , 33, 35, 37, 39, a subset thereof, the polypeptide codes of SEQ ID NOS: 
2, 4, 6. 8, 10, 12. 14, 16, 18, 20. 22, 24, 26. 28, 30, 32, 34, 36, 38, and a subset 
thereof may be stored and manipulated In a variety of data processor programs in a 
variety of formats. For example, one or more of the nucleic acid codes of SEQ ID 

30 NOS: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23. 25. 27, 29, 31, 33. 35. 37. 39, and one or 
more of the polypeptide codes of SEQ ID NOS: 2. 4, 6, 8, 10. 12, 14, 16. 18, 20. 22. 
24, 26. 28. 30, 32, 34, 36, 38 may be stored as ASCII or text in a word processing file, 
such as MicrosoftWORD or WORDPERFECT in a variety of database programs 
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familiar to those of skill in the art, such as DB2 or ORACLE. In addition, many 
computer programs and databases may be used as sequence comparers, identifiers or 
sources of query nucleotide sequences or query polypeptide sequences to be 
compared to one or more of the nucleic acid codes of SEQ ID NOS: 3, 5, 7, 9, 1 1 , 1 3, 
15, 17, 19, 21, 23. 25, 27. 29, 31, 33. 35, 37, 39, and one or more of the polypeptide 
codes of SEQ ID NOS: 2, 4, 6. 8. 10, 12, 14, 16, 18, 20. 22. 24. 26, 28. 30, 32. 34, 36, 
38. 

The following list is intended not to limit the invention but to provide guidance 
to programs and databases useful with one or more of the nucleic acid codes of SEQ 

10 ID NOS: 3, 5, 7, 9, 11, 13, 15. 17, 19, 21. 23, 25, 27, 29, 31, 33, 35. 37, 39, and the 
polypeptide codes of SEQ ID NOS: 2. 4. 6. 8. 10, 12, 14. 16, 18, 20, 22, 24, 26, 28, 30, 
32, 34, 36, 38. The program and databases which may be used include, but are not 
limited to: MacPattem (EMBL). DiscoveryBase (Molecular Applications Group), 
GeneMine (Molecular Applications Group) Look (Molecular Applications Group). 
MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTN and • 
BLASTX (Altschul ef a/., J. MoL Biol. 215:403 (1990)), FASTA (Person and Lipman, 
Proa Nalt Acad ScL USA, 85:2444 (1988)), FASTDB (Brutiag etaL Comp. App. 
BioscL 6-237-245, 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE 
(Molecular Simulations Inc.), Cerius^.DBAccess (Molecular Simulations Inc.), HypoGen 

20 (Molecular Simulations Inc.), Insight II (Molecular Simulations Inc.). Discover 

(Molecular Simulations Inc.), CHARMm (Molecular Simulations Inc.). Felix (Molecular 
Simulations Inc.), DelPhi (Molecular Simulations Inc.), QuanteMM (Molecular 
Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler (Molecular 
Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular 
Simulations Inc.), WetLab (Molecular Simulations Inc.), WetLab Diversity Explorer 
(Molecular Simulations Inc.). Gene Explorer (Molecular Simulations Inc.), SeqFold 
(Molecular Simulations Inc.), the MDL Available Chemicals Directory database, the 
MDL Drug Data Report data base, the Comprehensive Medicinal Chemistry database, 
DenA^ents' World Drug Index database, the BloByteMasterFile database, the Genbank 

30 database, and the Gensyqn database. Many other programs and databases would be 
apparent to one of skill in the art given the present disclosure. 

Embodiments of the present invention include systems, particularly computer 
systems that store and manipulate the sequence information described herein. As 
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used herein, "a computer system", refers to the hardware components, software 
components, and data storage components used to analyze one or more of the nucleic 
acid codes of SEQ ID NOS: 3, 5. 7, 9. 11, 13, 15, 17, 19, 21. 23, 25, 27, 29, 31, 33, 35, 
37. 39. and the polypeptide codes of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20. 
22. 24, 26. 28, 30, 32, 34, 36, 38. 

Preferably, the computer system is a general purpose system that comprises 
a processor and one or more intemal data storage components for storing data, and 
one or more data retrieving devices for retrieving the data stored on the data storage 
components. A skilled artisan can readily appreciate that any one of the cun'ently 

10 available computer systems are suitable. 

The computer system of Figure 1 illustrates components that may be present 
in a conventional computer system. One skilled in the art will readily appreciate that 
not all components illustrated in Figure 1 are required to practice the invention and, 
likewise, additional components not illustrated in Figure 1 may be present In a 
computer system contemplated for use with the invention. Refening to the computer 
system of Figure 1 , the components are connected to a central system bus 116. The 
components include a central processing unit 1 1 8 with internal 118 and/or external 
cache memory 120, system memory 122, display adapter 102 connected to a monitor 
100, network adapter 126 which may also be refen-ed to as a network interface, 

20 internal modem 1 24, sound adapter 128, lO controller 132 to which may be connected 
a keyboard 140 and mouse 138, or other suitable input device such as a trackball or 
tablet, as well as external printer 134, and/or any number of external devices such as 
external modems, tape storage drives, or disk drives 136. One or more host bus 
adapters 114 may be connected to the system bus 1 16. To host bus adapter 114 may 
. optionally be connected one or more storage devices such as disk drives 112 
(removable or fixed), floppy drives 1 10, tape drives 108, digital versatile disk DVD 
drives 106, and compact disk CD ROM drives 104. The storage devices may operate 
in read-only mode and / or in read-write mode. The computer system may optionally 
include multiple central processing units 1 18, or multiple banks of memory 122. 

30 Arrows 142 In Figure 1 indicate the interconnection of intemal components of the 
computer system. The arrows are Illustrative only and do not specify exact connection 
architecture. 
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Software for accessing and processing the one or more of the nucieic acid 
codes of SEQ ID NOS: 3, 5. 7. 9, H. 13, 15, 17. 19, 21. 23. 25. 27, 29, 31, 33, 35, 37. 
39, and the polypeptide codes of SEQ ID NOS: 2, 4, 6. 8. 10, 12, 14, 16, 18. 20. 22, 
24, 26, 28, 30, 32, 34, 36, 38 (such as sequence comparison software, analysis 
software as well as search tools, annotation tools, and modeling tools etc.) may reside 
in main memory 122 during execution. 

In one embodiment, the computer system further comprises a sequence 
comparison software for comparing the nucleic acid codes of a query sequence stored 
on a computer readable medium to a subject sequence which is also stored on a 

10 computer readable medium; or for comparing the polypeptide code of a query 

sequence stored on a computer readable medium to a subject sequence which is also 
stored on computer readable medium. A "sequence comparison software" refers to 
one or more programs that are implemented on the computer system to compare 
nucleotide and/or protein sequences with other nucleotide and/or sequences stored 
within the data storage means. The design of one example of a sequence comparison 
software is provided in Figures 2A, 2B, 2C and 2D. 

The sequence comparison software will typically employ one or more 
specialized comparator algorithms. Protein and/or nucleic acid sequence similarities 
may be evaluated using any of the variety of sequence comparator algorithms and 

20 programs known in the art. Such algorithms and programs include, but are no way 
limited to, TBLASTN, BLASTN, BLASTP, FASTA, TFASTA. CLUSTAL. HMMER, 
MAST, or other suitable algorithm known to those skilled in the art. (Pearson and 
Lipman, 1988, Proc NatL Acad. Sci USA 85(8): 2444-2448; Altschul etal, 1990, J, 
Mol. BioL 215(3):403-410; Thompson, ef a/., 1994, Nucleic Acids Res, 22(2):4673- 
4680; Higglns et al,, 1996, Methods EnzymoL 266:383-402; Altschul et al„ 1990, J. 
MoL BioL 215(3):403-410; Altschul a/., 1993, Nature Genetics 3:266-272; Eddy S.R,. 
Bioinfomnatics 14:755-763, 1998; Bailey TL et al, J Steroid Biochem Mol Biol 1 997 
May;62(1):29-44). One example of a comparator algorithm is illustrated in Figure 3. 
Sequence comparator algorithms Identified in this specification are particulariy 

30 contemplated for use in this aspect of the invention. 

The sequence comparison software will typically employ one or more 
specialized analyzer algorithms. One example of an analyzer algorithm is illustrated in 
Figure 4. Any appropriate analyzer algorithm can be used to evaluate similarities, 



wo 03/010193 PCT/CA02/01177 

-32- 

determined by the comparator algorithm, between a query sequence and a subject 
sequence (referred to herein as a query/subject pair). Based on context specific rules, 
the annotation of a subject sequence may be assigned to the query sequence. A 
skilled artisan can readily determine the selection of an appropriate analyzer algorithm 
and appropriate context specific rules. Analyzer algorithms identified elsewhere in this 
specification are particularly contemplated for use in this aspect of the invention. 

Figures 2A, 2B, 2C and 2D together provide a flowchart of one example of a 
sequence comparison software for comparing query sequences to a subject sequence. 
The software detennines if a gene or set of genes represented by their nucleotide 

10 sequence, polypeptide sequence or other representation (the query sequence) 16 
significantly similar to the one or more of the nucleic acid codes of SEQ ID NOS: 3, 5, 
7. 9, 11, 13, 15, 17. 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39 and the corresponding 
polypeptide codes of SEQ ID NOS: 2, 4. 6, 8, 10, 12, 14, 16, 18, 20. 22, 24. 26, 28, 30, 
32, 34, 36, 38 of the invention (the subject sequence). The software may be 
implemented in the C or C++ programming language, Java, Perl or other suitable 
programming language known to a person skilled in the art. 

One or more query sequence(s) are accessed by the program by means of 
input from the user 210, accessing a database 208 or opening a text file 206 as 
illustrated in the query initialization subprocess (Figure 2A). The query initialization 

20 subprocess allows one or more query sequence(s) to be loaded into computer memory 
122, or under control of the program stored on a disk drive 112 or other storage device 
in the form of a query sequence array 216. The query array 216 is one or more query 
nucleotide or polypeptide sequences accompanied by some appropriate identifiers. 

A dataset is accessed by the program by means of input from the user 228. 
accessing a database 226, or opening a text file 224 as illustrated in the subject 
datasource initialization subprocess (Figure 28). The subject data source initialization 
process refers to the method by which a reference dataset containing one or more 
sequence selected from the nucleic acid codes of SEQ ID NOS: 3, 5. 7, 9, 1 1, 13. 15, 
17. 19, 21 , 23, 25, 27, 29, 31. 33, 35, 37, 39 and the corresponding polypeptide codes 

30 of SEQ ID NOS: 2, 4, 6. 8, 10, 12. 14, 16, 18. 20, 22, 24, 26. 28, 30, 32. 34, 36, 38 is 
loaded into computer memory 122. or under control of the program stored on a disk 
drive 1 12 or other storage device in the fomi of a subject array 234. The subject anray 
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234 comprises one or more subject nucleotide or polypeptide sequences accompanied 
by some appropriate identifiers. 

The comparison subprocess of Figure 2C illustrates a process by which the 
comparator algorithm 238 is invoked by the software for pairwise comparisons 
between query elements in the query sequence aaay 216, and subject elements in the 
subject array 234. The "comparator algorithm" of Figure 2C refers to the pair-wise 
comparisons between a query sequence and subject sequence, i.e. a query/subject 
pair from their respective anrays 216, 234. Comparator algorithm 238 may be any 
algorithm that acts on a query/subject pair, including but not limited to homology 

10 algorithms such as BLAST, Smith Waterman, Fasta, or statistical 

representation/probabilistic algorithms such as Markov models exemplified by 
HMMER, or other suitable algorithm known to one skilled in the art. Suitable 
algorithms would generally require a query/subject pair as input and return a score (an 
indication of likeness between the query and subject), usually through the use of 
appropriate statistical methods such as Kariin Altschul statistics used in BLAST, 
FoHA^ard or Viterbi algorittims used in Markov models, or other suitable statistics known 
to those skilled in tiie art. 

The sequence comparison software of Figure 2C also comprises a means of 
analysis of the results of the pair-wise comparisons performed by the comparator 

20 algorithm 238. The "analysis subprocess" of Figure 2C is a process by which the 

analyzer algorithm 244 is invoked by the software. The "analyzer algorithm" refers to a 
process by which annotation of a subject is assigned to the query based on 
query/subject similarity as determined by the comparator algorithm 238 according to 
context-specific mles coded into the program or dynamically loaded at runtime. 
Context-specific rules are what the program uses to determine if the annotation of the 
subject can be assigned to the query given the context of the comparison. These rules 
allow the software to qualify the overall meaning of the results of the comparator 
algorithm 238. 

In one embodiment, context-specific rules may state that for a set of query 
30 sequences to be considered representative of a rosaramicin biosynthetic locus, the 
comparator algoritiim 238 must detemnine that the set of query sequences contains at 
least five query sequences that show a statistical similarity to a subject sequence 
conresponding to the polypeptide codes of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 18, 18, 
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20, 22, 24, 26.. 28, 30, 32, 34, 36, 38. Of course preferred context specific rules may 
specify a wide variety of thresholds for identifying rosaramicin biosynthetic genes or 
rosaramicin-producing organisms without departing from the scope of the invention. 
Some thresholds contemplate that at least one query sequence in the set of query 
sequences show a statistical similarity to the nucleic acid code corresponding to 5, 6. 
7, 8 or more of the polypeptide codes of SEQ ID NOS: 2, 4, 6, 8. 10, 12, 14. 16, 18, 20. 
22, 24, 26, 28, 30, 32, 34, 36, 38. Other context specific rules set the level of 
homology required in each of the group and may be set at 70%, 80%. 85%, 90%, 95% 
or 98% in regards to any one or more of the subject sequences. 

10 In another embodiment context-specific rules may state that for a query 

sequence to be considered indicative of a macrolide, the comparator algorithm 238 
must determine that the query sequence shows a statistical similarity to subject 
sequences corresponding to a nucleic acid sequence code for a polypeptide of SEQ ID 
NO: 10, 12, 14, 16 and 18, polypeptides having at least 75% homology to a 
polypeptide of SEQ ID NOS: 10. 12, 14, 16 and 18 and fragment comprising at least 
400 consecutive amino acids of the polypeptides of SEQ ID NOS: 10, 12. 14. 16 and 
18. Of course preferred context specific rules may specify a wide variety of thresholds 
for identifying a macrolide protein without departing from the scope of the invention. 
Some context specific mles set level of homology required of the query sequence at 

20 70%, 80%, 85%, 90%, 95% or 98%. 

Thus, the analysis subprocess may be employed in conjunction with any 
other context specific rules and may be adapted to suit different embodiments. The 
principal function of the analyzer algorithm 244 is to assign meaning or a diagnosis to 
a query or set of queries based on context specific rules that are application specific 
and may be changed without altering the overall role of the analyzer algorithm 244. 

Finally the sequence comparison software of Figure 2 comprises a means of 
retuming of the results of the comparisons by the comparator algorithm 238 and 
analyzed by the analyzer algorithm 244 to the user or process that requested the 
comparison or comparisons. The "display / report subprocess" of Figure 2D is the 

30 process by which the results of the comparisons by the comparator algorithm 238 and 
analyses by the analyzer algorithm 244 are returned to the user or process that 
requested the comparison or comparisons. The results 240, 246 may be written to a 
file 252, displayed in some user interface such as a console, custom graphical 
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interface, web interface, or other suitable implementation specific interface, or 
uploaded to some database such as a relational database, or other suitable 
implementation specific database. Once the results have been returned to the user or 
process that requested the comparison or comparisons the program exits. 

The principle of the sequence comparison software of Figure 2 is to receive 
or load a query or queries, receive or load a reference dataset, then run a pair-wise 
comparison by means of the comparator algorithm 238. then evaluate the results using 
an analyzer algorithm 244 to anive at a determination if the query or queries bear 
significant similarity to the reference sequences, and finally retum the results to the 

10 user or calling program or process. 

Figure 3 Is a flow diagram illustrating one embodiment of comparator 
algorithm 238 process in a computer for determining whether two sequences are 
homologous. The comparator algorithm receives a query/subject pair for comparison, 
performs an appropriate comparison, and returns the pair along with a calculated 
degree of similarity. 

Refening to Figure 3, the comparison is initiated at the beginning of 
sequences 304. A match of (x) characters is attempted 306 where (x) is a user 
specified number. If a match is not found the query sequence is advanced 316 by one 
character with respect to the subject, and if the end of the query has not been reached 

20 318 another match of (x) characters is attempted 306. Thus if no match has been 
found the query is incrementally advanced in entirety past the initial position of the 
subject. Once the end of the query is reached 31 8, the subject pointer is advanced by 
1 character and the query pointer is set to the beginning of the query 320. If the end of 
the subject has been reached and still no matches have been found a null homology 
result score is assigned 324 and the algorithm returns the pair of sequences along with 
a null score to the calling process or program. The algorithm then exits 326. If instead 
a match Is found 308, an extension of the matched region is attempted 310 and the 
match is analyzed statistically 312. The extension may be unidirectional or 
bidirectional. The algorithm continues in a loop extending the matched region and 

30 computing the homology score, giving penalties for mismatches taking into 

consideration that given the chemical properties of the amino acid side chains (in the 

I 

case of comparisons) not all mismatches are equal. For example a mismatch of a 
lysine with an arginine both of which have basic side chains receive a lesser penalty 
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than a mismatch between lysine and glutamate which has an acidic side chain. The 
extension loop stops once the accumulated penalty exceeds some user specified 
value, or of the end of either sequence is reached 312. The maximal score is stored 
314, and the query sequence is advanced 316 by one character with respect to the 
subject, and if the end of the query has not been reached 318 another match of (x) 
characters is attempted 306. The process continues until the entire length of the 
subject has been evaluated for matches to the entire length of the query. All individual 
scores and alignments are stored 314 by the algorithm and an overall score is 
computed 324 and stored. The algorithm returns the pair of sequences along with 
10 local and global scores to the calling process or program. The algorithm then exits 326. 

One example of comparator algorithm 238 algorithm may be represented in 
pseudocode as follows: 



INPUT: Q[m] : query, m is the length 

S[n]: subject,, n is the length 
X: X is the size of a segment 

START: 

for each i in Cl,n3 do 

for each j in [l,m] do 
20 if ( j + X - 1 ) <= m and ( i + x -1 ) <= n then 

if Q(j/ j+x-1) = S(i, i+x-1) then 
kal; 

while Q(j, j + x-l+k ) = S(i, i+x-l+ k) do 
k++; 

Store highest local homology 

Compute overall homology score 

Return local and overall homology scores 

END. 

30 

The comparator algorithm 238 may be written for use on nucleotide 
sequences, in which case the scoring scheme would be implemented so as to 
calculate scores and apply penalties based on the chemical nature of nucleotides. The 
comparator algorithm 238 may also provide for the presence of gaps in the scoring 
method for nucleotide or polypeptide sequences. 

BLAST is one implementation of the comparator algorithm 238. HMMER Is 
another implementation of the comparator algorithm 238 based on IVlarl^ov mode! 
analysis. In a HMMER implementation a query sequence would be compared to a 
mathematical model representative of a subject sequence or sequences rather than 
40 using sequence homology. 
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Figure 4 is a flow diagram illustrating an analyzer algorithm 244 process for 
detecting the presence of a rosaramicin biosynthetic locus. The analyzer algorithm of 
Figure 4 may be used in the process by which the annotation of a subject is assigned 
to the query based on their similarity as detenmined by the comparator algorithm 238 
and according to context-specific rules coded into the program or dynamically loaded 
at runtime. Context sensitive rules are what determines if the annotation of the subject 
can be assigned to the query given the context of the comparison. Context specific 
rules set the thresholds for determining the level and quality of similarity that would be 
accepted In the process of evaluating matched pairs. 

10 The analyzer algorithm 244 receives as its input an array of pairs that had 

been matched by the comparator algorithm 238. The an-ay consists of at least a query 
identifier, a subject Identifier and the associated value of the measure of their similarity. 
To determine if a group of query sequences includes sequences diagnostic of a 
rosaramicin biosynthetic gene cluster, a reference or diagnostic array 406 is generated 
by accessing a data source and retrieving rosaramicin specific information 404 relating 
to nucleic acid codes of SEQ ID NOS: 3, 5, 7, 9. 1 1 , 1 3, 1 5, 1 7, 1 9, 21 , 23, 25, 27. 29, 
31 , 33, 35, 37. 39 and the conresponding polypeptide codes of SEQ ID NOS: 2, 4, 6, 8. 
10, 12, 14. 16. 18, 20, 22, 24, 26, 28, 30, 32. 34, 36, 38. Diagnostic array 406 consists 
at least of subject identifiers and their associated annotation. Annotation may include 

20 reference to the protein families ABCC, DATF, GTFA, MTFA, MTRA. NBPA, OXRB, 
OXRC, OXRH, PKSH, REGM, REGS. SURA and TESA. Annotation may also include 
information regarding presence in loci of a specific structural class or may include 
previously computed matches to other databases, for example databases of motifs. 

Once the algorithm has successfully generated or received the two 
necessary arrays 402, 406, and holds in memory any context specific rules, each 
matched pair as determined by the comparator algorithm 238 can be evaluated. The 
algorithm will perform an evaluation 408 of each matched pair and based on the 
context specific rules confimn or fail to confirm the match as valid 410. In cases of 
successful confirmation of the match 410 the annotation of the subject is assigned to 

30 the query. Results of each comparison are stored 412. The loop ends when the end of 
the query / subject array is reached. Once all query / subject pairs have been 
evaluated against one or more of the nucleic acid codes of SEQ ID NOS: 3, 5, 7, 9, 11, 
13, 15, 17, 19, 21. 23. 25, 27, 29, 31. 33, 35, 37, 39 and the polypeptide codes of SEQ 
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ID NOS: 2. 4, 6. 8. 10. 12, 14. 16. 18, 20. 22, 24. 26, 28. 30, 32, 34, 36, 38 in the 
subject array, a final determination can be made if tfie query set of ORFs represents a 
rosaramlcin locus 416. The algorithm then returns the overall diagnosis and an array 
of characterized query / subject pairs along with supporting evidence to the calling 
program or process and then tenninates 41 8. 

The analyzer algorithm 244 may be configured to dynamically load different 
diagnostic an-ays and context specific rules. It may be used for example in the 
comparison of query/subject pairs with diagnostic subjects for other biosynthetic 
pathways, such as macrolide biosynthetic pathways. 

10 Thus one embodiment of the present invention Is a computer readable 

medium having stored thereon a sequence selected from the group consisting of a 
nucleic acid code of SEQ ID NOS: 3, 5. 7, 9, 11, 13, 15. 17. 19. 21, 23, 25. 27. 29. 31, 
33, 35, 37, 39 and a polypeptide code of SEQ ID NOS: 2, 4, 6. 8. 10, 12, 14, 16. 18, 
20, 22, 24, 26, 28, 30, 32, 34, 36, 38. Another aspect of the present Invention Is a 
computer readable medium having recorded thereon one or more nucleic acid codes 
of SEQ ID NOS: 3, 5, 7. 9, 11, 13, 15, 17, 19, 21, 23, 25, 27. 29, 31, 33, 35, 37, 39, 
preferably at least 2, 5, 10, 15, or 20 nucleic acid codes of SEQ ID NOS: 3, 5, 7, 9, 11 , 
13, 15, 17, 19, 21 , 23, 25, 27, 29, 31, 33, 35, 37, 39. Another aspect of the invention is 
a computer readable medium having recorded thereon one or more of the polypeptide 

20 codes of SEQ ID NOS: 2, 4. 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32. 34, 36, 
38, preferably at least 2, 5, 10, 15 or 20 polypeptide codes of SEQ ID NOS: 2. 4, 6, 8, 
10. 12. 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38. 

Another embodiment of the present invention is a computer system 
comprising a processor and a data storage device wherein said datia storage device 
has stored thereon a reference sequence selected from the group consisting of a 
nucleic add code of SEQ ID NOS: 3, 5, 7, 9, 1 1 , 1 3, 1 5. 1 7, 1 9, 21 , 23, 25. 27, 29. 31 , 
33. 35. 37. 39 and a polypeptide code of SEQ ID NOS: 2, 4. 6, 8, 10, 12, 14, 16. 18, 
20. 22, 24. 26. 28. 30. 32. 34. 36. 38. 

Computer tBadable media include magnetically readable media, optically 

30 readable media, electronically readable media and magnetic/optical media. For 

example, the computer readable media may be a hard disk, a floppy disk, a magnetic 
tape. CD-ROM, Digital Versatile Disk (DVD), Random Access Memory (RAM), or Read 
Only Memory (ROM) as well as other types of media known to those skilled in the art. 
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The present invention will be further described with reference to the following 
examples; however, it is to be understood that the present invention is not limited to 
such examples: 

EXAMPLE 1 : Identification and sequencing of a rosaramicin biosvnthetic locus in 
MicromonosDora carbonacea van aurantiaca NRRL 2997 

Micromonospora carbonacea var. aurantiaca NRRL 2997 was obtained from 
the Agricultural Research Service collection (National Center for Agricultural Utilization 
Research, 1815 N. University Street, Peoria, Illinois 61604) and cultured using 

10 standard microbiological techniques (Kieser et ai, supra). This organism was 
propagated on oatmeal agar medium at 28 degrees Celsius for several days. For 
isolation of high molecular weight genomic DNA, cell mass from three freshly grown, 
near confluent 100 mm petri dishes was used. The cell mass was collected by gentle 
scraping with a plastic spatula. Residual agar medium was removed by repeated 
washes with STE buffer (75 mM NaCI; 20 mM Tris-HCI, pH 8.0; 25 mM EDTA). High 
molecular weight DNA was isolated by established protocols (Kieser et al. supra) and 
its integrity was verified by field inversion gel electrophoresis (FIGE) using the preset 
program number 6 of the FIGE MAPPER™ power supply (BIORAD). This high 
molecular weight genomic DNA served for the preparation of a small size fragment 

20 genomic sampling library (GSL). as well as a large size fragment cluster identification 
library (GIL). Both libraries contained randomly generated M. carbonacea genomic 
DNA fragments and, therefore, are representative of the entire genome of this 
organism. 

For the generation of the GSL library, genomic DNA was randomly sheared 
by sonication. DNA fragments having a size range between 1 .5 and 3 kb were 
fractionated on a agarose gel and isolated using standard molecular biology 
techniques (Sambrook et a!., supra). The ends of the obtained DNA fragments were 
repaired using T4 DNA polymerase (Roche) as described by the supplier. This 
enzyme creates DNA fragments with blunt ends that can be subsequently cloned into 
30 an appropriate vector. The repaired DNA fragments were subcloned into a derivative 
of pBluescript SK+ vector (Stratagene) which does not allow transcription of cloned 
DNA fragments. This vector was selected as it contains a convenient polylinker region 
surrounded by sequences corresponding to universal sequencing primers such as T3, 



wo 03/010193 



-40- 



PCT/CA02/01177 



T7, SK, and KS (Stratagene). The unique EcoRV restriction site found in tlie polylinker 
region was used as it allows insertion of blunt-end DNA fragments. Ligation of the 
inserts, use of the ligation products to transform E coli DH10B (Invitrogen) host and 
selection for recombinant clones were performed as previously described (Sambrook 
et al., supra), Plasmid DNA carrying the M. carbonacea genomic DNA fragments was 
extracted by the alkaline lysis method (Sambrook et al., supra) and the insert size of 
1 .5 to 3 kb was confirmed by electrophoresis on agarose gels. Using this procedure, a 
library of small size random genomic DNA fragments is generated that covers the 
entire genome of the studied microorganism. The number of individual clones that can 

10 be generated is infinite but only a small number is further analyzed to sample the 
microorganism's genome. 

A CIL library was constructed from the Af. carbonacea high molecular weight 
genomic DNA using the SuperCos-1 cosmid vector (Stratagene*^). The cosmid arms 
were prepared as specified by the manufacturer. The high molecular weight DNA was 
subjected to partial digestion at 37 degrees Celsius with approximately one unit of 
SauZAl restriction enzyme (New England Biolabs) per 100 micrograms of DNA in the 
buffer supplied by the manufacturer. This procedure generates random fragments of 
DNA ranging from the initial undigested size of the DNA to short fragments of which 
the length is dependent upon the frequency of the enzyme DNA recognition site in the 

20 genome and the extent of the DNA digestion by the enzyme. At various timepoints, 
aliquots of the digestion were transferred to new microfuge tubes and the enzyme was 
inactivated by adding a final concentration of 10 mM EDTA and 0.1% SDS. Aliquots 
judged by FIGE analysis to contain a significant fraction of DNA in the desired size 
range (30-50kb) were pooled, extracted with phenol/chloroform (1:1 vohvol), and 
pelletted by ethanol precipitation. The 5' ends of Sau3A\ DNA fragments were 
dephosphorylated using alkaline phosphatase (Roche) according to the manufacturer's 
specifications at 37 degrees Celcius for 30 min. The phosphatase was heat 
inactivated at 70 degrees Celcius for 10 min and the DNA was extracted with 
phenol/chloroform (1:1 vol:vol), pelletted by ethanol precipitation, and resuspended in 

30 sterile water. The dephosphorylated SauZAl DNA fragments were then ligated 

ovemight at room temperature to the SuperCos-1 cosmid amis In a reaction containing 
approximately four-fold molar excess SuperCos-1 cosmid amis. The ligation products 
were packaged using Gigapack® III XL packaging extracts (Stratagene™) according to 
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the manufacturer's specifications. The CIL library consisted of 864 isolated cosmid 
clones in E. coli DH10B (Invitrogen). These clones were picked and inoculated into 
nine 96-well microtlter plates containing LB broth (per liter of water 10.0 g NaCI; 10.0 
g tryptone; 5.0 g yeast extract) which were grown overnight and then adjusted to 
contain a final concentration of 25% glycerol. These microtiter plates were stored at - 
80 degrees Celclus and served as glycerol stocks of the CIL library. Duplicate 
microtiter plates were an'ayed onto nylon membranes as follows. Cultures grown on 
microtiter plates were concentrated by pelleting and resuspending in a small volume of 
LB broth. A 3 X 3 grid (96-pin) was anrayed onto nylon membranes. These 

10 membranes representing the complete CIL library were then layered onto LB agar and 
incubated ovenight at 37 degrees Celclus to allow the colonies to grow. The 
merhbranes were layered onto filter paper pre-soaked with 0.5 N NaOH/1.5 M NaCI for 
10 min to denature the DNA and then neutralized by transferring onto filter paper pre- 
soaked with 0.5 M Tris (pH 8)/1 .5 M NaCI for 1 0 min. Cell debris was gently scraped 
off with a plastic spatula and the DNA was crosslinked onto the membranes by UV 
in-adiation using a GS GENE LINKER™ UV Chamber (BIORAD). Considering an 
average size of 8 Mb for an actinomycete genome and an average size of 35 kb of 
genomic insert in the CIL library, this library represents roughly a 4-fold coverage of 
the microorganism's entire genome. 

20 The GSL library was analyzed by sequence detemiination of the cloned 

genomic DNA inserts. The universal primers KS or T7, referred to as fonA^ard (F) 
primers, were used to initiate polymerization of labeled DNA. Extension of at least 700 
bp from the priming site can be routinely achieved using the TF, BDT v2.0 sequencing 
kit as specified by the supplier (Applied Biosystems). Sequence analysis of the small 
genomic DNA fragments (Genomic Sequence Tags, GSTs) was performed using a 
3700 ABI capillary electrophoresis DNA sequencer (Applied Biosystems). The 
average length of the DNA sequence reads was -700 bp. Further analysis of the 
obtained GSTs was perfonmed by sequence homology comparison to various protein 
sequence databases. The DNA sequences of the obtained GSTs were translated into 

30 amino acid sequences and compared to the National Center for Biotechnology 

Infomnation (NCBI) nonredundant protein database and the proprietary Ecopia natural 
product biosynthetic gene Decipher™ database using previously described algorithms 
(Altschul et al., supra). Sequence similarity with known proteins of defined function in 
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the database enables one to make predictions on tlie function of the partial protein that 
is encoded by the translated GST. 

A total of 437 M. carbonacea GSTs were generated using the fonA/ard 
sequencing primer and analyzed by sequence comparison using the Blast algorithm 
(Altschul et al., supra). Sequence alignments displaying an E value of at least e-5 
were considered as significantly homologous and retained for further evaluation. 
GSTs showing similarity to a gene of interest can be at this point selected and used to 
identify larger segments of genomic DNA from the CIL library that include the gene(s) 
of interest. Polyketide natural products are often synthesized by type I polyketide 

10 synthases (PKSs). Several fonvard GST reads were identified as portions of PKS 
genes. For example, one such GST encoded an internal portion of a PKS acyl 
transferase (AT) domain in the antisense orientation relative to the sequencing primer. 
The GSL clone from which this GST was obtained was also sequenced using the 
reverse sequencing primer and was found to encode the N-tenninal portion of a PKS 
ketosynthase (KS) domain In the sense orientation relative to the sequencing primer. 
Based on the sequence of the fonA^ard read of this GSL clone, a 20mer oligonucleotide 
was designed for use as a probe to identify and isolate GIL clones which harbored the 
sequences of interest. 

Hybridization oligonucleotide probes were radiolabeled with P^^ using T4 

20 polynucleotide kinase (New England Biolabs) in 15 microliter reactions containing 5 
picomoles of oligonucleotide and 6.6 picomoles of [y-P^^]ATP in the kinase reaction 
buffer supplied by the manufacturer. After 1 hour at 37 degrees Celcius, the kinase 
reaction was terminated by the addition of EDTA to a final concentration of 5 mM. The 
specific activity of the radiolabeled oligonucleotide probes was estimated using a 
Model 3 Geiger counter (Ludlum Measurements Inc., Sweetwater. Texas) with a built- 
in integrator feature. The radiolabeled oligonucleotide probes were heat-denatured by 
Incubation at 86 degrees Celcius for 10 minutes and quick-cooled in an ice bath 
Immediately prior to use. 

The CIL library membranes were pretreated by Incubation for at least 2 

30 hours at 42 degrees Celcius in Prehyb Solution (6X SSC; 20mM NaH2P04; 5X 

Denhardt's; 0.4% SDS; 0.1 mg/ml sonicated, denatured salmon spemi DNA) using a 
hybridization oven with gentle rotation. The membranes were then placed in Hyb 
Solution (6X SSC; 20mM NaH2P04; 0.4% SDS; 0.1 mg/ml sonicated, denatured 
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salmon sperm DNA) containing 1X10^ cpm/ml of radiolabeled oligonucleotide probe 
and incubated overnight at 42 degrees Celcius using a hybridization oven with gentle 
rotation. The next day, the membranes were washed with Wash Buffer (6X SSC. 0.1% 
SDS) for 45 minutes each at 46, 48, and 50 degrees Celcius using a hybridization 
oven with gentle rotation. The membranes were then exposed to X-ray film to 
visualize and identify the positive cosmid clones. Positive clones were identified, 
cosmid DNA was extracted from 30 ml cultures using the alkaline lysis method 
(Sambrook et a!., supra) and the inserts were entirely sequenced using a shotgun 
sequencing approach (Fleischmann et a!., Science, 269:496-512). 

10 Sequencing reads were assembled using the Phred-Phrap™ algorithm 

(University of Washington, Seattle, USA) recreating the entire DNA sequence of the 
cosmid insert. Reiterations of hybridizations of the CIL library with probes derived from 
the ends of the original cosmid allow indefinite extension of sequence information on 
both sides of the original cosmid sequence until the complete sought-after gene cluster 
is obtained. Three overiapping cosmid clones that were either directly identified by the 
original oligonucleotide probe (derived from the GSL clone) or by probes derived from 
the ends of the original cosmlds have been completely sequenced to provide over 60 
Kb of genetic infonnation. Subsequently, the forward and reverse reads of the GSL 
clone from which the original oligonucleotide probe was derived were mapped to a 

20 region of the rosaramicin biosynthetic locus that encodes a portion of the PKS gene 
identified herein as ORF 7, more specifically nucleotides encoding amino acids 1531 to 
2416 approximately. This coresponds to a GSL clone with an insert size of 
approximately 2.6kb, in good agreement with the selected size range of 1.5- 3kb 
described above. The sequence of these cosmids and analysis of the proteins 
encoded by them undoubtedly demonstrated that the gene cluster obtained was 
indeed responsible for the production of a glycosylated macrolide consistent with the 
known structure of rosaramicin, which was not previously reported to be produced by 
M carbonacea var aurantiaca NRRL 2997. 

30 Example 2: Genes and proteins involved in biosvnthesis of rosaramicin 

The rosaramicin locus Includes the 60196 base pairs provided in SEQ ID 
NO: 1 and contains the 19 ORFs provided SEQ ID NOS: 3. 5, 7, 9. 1 1. 13, 15, 17, 19. 
21. 23, 25, 27, 29, 31. 33, 35, 37, 39. More than 19 kilobases of DNA sequence were 
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analyzed on each side of the rosaramicin locus and these regions contain primary 
metabolic genes. The accompanying sequence listing provides the nucleotide 
sequence of the 19 ORFs regulating the biosynthesis of rosaramicin and the 
con'esponding deduced polypeptides, wherein ORF 1 (SEQ ID NO: 3) represents the 
polynucleotide drawn from residues 1 to 1683 (sense strand) of SEQ ID NO: 1; ORF 2 
(SEQ ID NO: 5) represents the polynucleotide drawn from residues 2522 to 1728 
(antisense strand) of SEQ ID NO: 1; ORF 3 (SEQ ID NO: 7) represents the 
polynucleotide drawn from residues 3861 to 2629 (antisense strand) of SEQ ID NO: 1 ; 
ORF 4 (SEQ ID NO: 9) represents the polynucleotide drawn from residues 4365 to 

10 5573 (sense strand) of SEQ ID NO: 1 ; ORF 5 (SEQ ID NO: 1 1 ) represents the 

polynucleotide drawn from residues 5702 to 191 17 (sense strand) of SEQ ID NO: 1 ; 
ORF 6 (SEQ ID NO: 13) represents the polynucleotide drawn from residues 19144 to 
24921 (sense strand) of SEQ ID NO: 1; ORF 7 (SEQ ID NO: 15) represents the 
polynucleotide drawn from residues 24993 to 36230 (sense strand) of SEQ ID NO: 1; 
ORF 8 (SEQ ID NO: 17) represents the polynucleotide drawn from residues 36292 to 
41016 (sense strand) of SEQ ID NO: 1; ORF 9 (SEQ ID NO: 19) represents the 
polynucleotide drawn from residues 41049 to 46403 (sense strand) of SEQ ID NO: 1; 
ORF 10 (SEQ ID NO: 21) represents the polynucleotide drawn from residues 46400 to 
47794 (sense strand) of SEQ ID NO: 1; ORF 1 1 (SEQ ID NO: 23) represents the 

20 polynucleotide drawn from residues 47794 to 49083 (sense strand) of SEQ ID NO: 1 ; 
ORF 12 (SEQ ID NO: 25) represents the polynucleotide drawn from residues 49092 to 
49814 (sense strand) of SEQ ID NO: 1; ORF 13 (SEQ ID NO: 27) represents the 
polynucleotide drawn from residues 49868 to 51226 (sense strand) of SEQ ID NO: 1; 
ORF 14 (SEQ ID NO: 29) represents the polynucleotide drawn from residues 51506 to 
53416 (sense strand) of SEQ ID NO: 1; ORF 15 (SEQ ID NO: 31) represents the 
polynucleotide drawn from residues 54569 to 53358 (antisense strand) of SEQ ID NO: 
1; ORF 16 (SEQ ID NO: 33) represents the polynucleotide drawn from residues 54897 
to 56342 (sense strand) of SEQ ID NO: 33; ORF 17 (SEQ ID NO: 35) represents the 
polynucleotide drawn from residues 56408 to 57634 (sense strand) of SEQ ID NO: 1 ; 

30 ORF 18 (SEQ ID NO: 37) represents the polynucleotide drawn from residues 57657 to 
59123 (sense strand) of SEQ ID NO: 1; ORF 19 (SEQ ID NO: 39) represents the 
polynucleotide drawn from residues 59363 to 60196 (sense strand) of SEQ ID NO: 1 . 
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Some open reading frames listed herein initiate with non-standard initiation 
codons (e.g. GTG - Valine or CTG - Leucine) rather than the standard initiation codon 
ATG, namely ORFs 1. 6, 7. 10. 14 and 18. All ORFs are listed with the appropriate M, 
V or L amino acids at the amino-temriinal position to indicate the specificity of the first 
codon of the ORF. It is expected, however, that in all cases the biosyntheslzed protein 
will contain a methionine residue, and more specifically a formylmethionine residue, at 
the amino temiinal position, In keeping with the widely accepted prindple that protein 
synthesis in bacteria initiates with methionine (formylmethionine) even when the 
encoding gene specifies a non-standard initiation codon (e.g. Stryer, Biochemistry 3"* 
10 edition. 1998, W.H. Freeman and Co., New York. pp. 752-754). 

Three deposits, namely £ coff DH10B (O10CK) strain, E. coli DH10B 
(O10CF) strain and E. coli DH10B (O10CJ) strain each harbouring a cosmid clone of a 
partial biosynthetic locus for rosaramicin from Micromonospora carbonacea subsp. 
aurantiaca have been deposited with the International Depositary Authority of Canada. 
Bureau of Microbiology, Health Canada, 1015 Ariington Street, Winnipeg, Manitoba, 
Canada R3E 3R2 on July 10, 2002 and were assigned deposit accession number 
IDAC 100702-1, 100702-2 and 100702-3 respectively. The £ co// strain deposits are 
referred to herein as "the deposited strains". 

The cosmids harbored in the deposited strains comprise a complete 
20 biosynthetic locus for rosaramicin. The sequence of the polynucleotides comprised in 
the deposited strains, as well as the amino acid sequence of any polypeptide encoded 
thereby are controlling in the event of any conflict with any description of sequences 
herein. 

The deposit of the deposited strains has been made under the terms of the 
Budapest Treaty on the Intemational Recognition of the Deposit of Micro-organisms for 
Purposes of Patent Procedure. The deposited strains will be irrevocably and without 
restriction or condition released to the public upon the issuance of a patent. The 
deposited strains are provided merely as convenience to those skilled in the art and 
are not an admission that a deposit is required for enablement, such as that required 
30 under 35 U.S.C. §1 12. A license may be required to make, use or sell the deposited 
strains, and compounds derived therefrom, and no such license is hereby granted. 

The order and relative position of the 19 open reading frames and the 
conresponding polypeptides of the biosynthetic locus for rosaramicin are provided in 
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Figure 5. The arrows represent the orientatation of the ORFs of the rosaramicin 
biosynthetic locus. The top line In Figure 5 provides a scale in kilobase pairs. The 
black bars depict the part of the locus covered by each of the deposited cosmids 
O10CK, O10CF and O10CJ. 

In order to Identify the function of the genes in the rosaramicin locus, SEQ ID 
NOS: 2. 4, 6, 8, 10, 12, 14. 16, 18. 20. 22, 24, 26. 28. 30, 32. 34. 36. 38 were 
compared, using the BLASTP version 2.2.1 algorithm with the default parameters, to 
sequences In the National Center for Biotechnology Infomiation (NCBI) nonredundant 
protein' database and the DECIPHER™ database of microbial genes, pathways and 
10 natural products (Ecopia Biosciences Inc. St.-Laurent, QC, Canada). 

The accession numbers of the top GenBank hits of this BIj\ST analysis are 
presented In Table 2 along with the corresponding E value. The E value relates the 
expected number of chance alignments with an alignment score at least equal to the 
observed alignment score. An E value of 0.00 indicates a perfect homolog or nearly 
perfect homolog. The E values are calculated as described.in Altschul et al. J. Mol. 
Biol., October 5; 215(3) 403-10. The E value assists in the detennination of whether 
two sequences display sufficient similarity to justify an inference of homology. 
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Example 3: FoiTnation of rosaramicin 

The chemical structure of rosaramicin is a 16-membered macrolide having an 
epoxide, an aldehyde and a deoxyamino sugar. The rosaramicin locus includes five 
polyketide synthase (PKS) Type I genes. ORF 5 represents a PKS Type I gene having 
a domain anrangement of KS-AT-ACP-KS-AT-KR-ACP-KS-AT-DH-KR-ACP. ORF 6 
represents a PKS Type 1 gene having a domain anrangement of KS-AT-DH-KR-ACP. 
ORF 7 represents a PKS Type I gene having a domain arrangement of KS-AT-KR-ACP- 
KS-AT-DH-ER-KR-ACP. ORF 8 represents a PKS Type I gene having a domain 
arrangement of KS-AT-KR-ACP. ORF 9 represents a PKS Type I gene having a 

10 domain arrangement of KS-AT-KR-ACP-Te. 

While not intending to be limited to any particular mode of action or 
biosynthetic scheme, the gene products of the invention can explain the synthesis of 
rosaramicin. ORFs 5, 6, 7, 8, and 9 constitute a polyketide synthase system that 
assembles the core polyketide precursor of rosaramicin. Figure 6 highlights 
schematically the series of reactions catalyzed by this polyketide synthase system 
based on the correlation between the deduced domain architecture and the polyketide 
core of rosaramicin. Type I PKS domains and the reactions they carry out are well 
known to those skilled in the art and well documented in the literature, see for example, 
Hopwood (1997) Chem. Rev. Vol 97 pp. 2465-2497. 

20 Figure 7 depicts a proposed biochemical pathway involving the OXRB, DATF. 

SURA, MTFA gene products for the formation of the deoxyamino sugar. This sugar is 
transferred to the core polyketide precursor of rosaramicin by the GTFA gene product. 
Also depicted in Figure 7 are the oxidation reactions carried out by two cytochrome 
P450 mondoxygenases 0XRC1 and 0XRC2, refening to ORFs 3 and 4, respectively. 
0XRC1 is expected to catalyze the fonnation of an aldehyde while 0XRC2 is expected 
to catalyze the fomnation of an epoxide. While Figure 7 proposes one scheme in regard 
to timing of the glycosylation and oxidation reactions catalyzed by the GTFA, OXRC1 
and 0XRC2, the invention does not reside In the actual timing and order of the 
reactions, which may be different then that depicted in Figure 7. 

30 Figures 8 to 1 0 are amino acid alignments comparing the rosaramicin PKS 

domains. The domains which occur only once in the rosaramicin PKS, namely the 
enoylreductase (ER) and thioesterase (Te) domains, are compared to prototypical 
domains from the erythromycin PKS system (DEBS). Where applicable, key active site 
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residues and motifs for the various polyketide synthase domains as described in 
Kakavas et al. (1997) J. Bacteriol. Vol 179 pp. 7515-7522 are indicated in Figures 8 to 
14. In each of the clustal alignments a line above the alignement is used to mark 
strongly conserved positions. In addition, three characters, namely * (asterisk), : (colon) 
and . (period) are used, wherein indicates positions which have a single, fully 
conserved residue; ":" Indicates that one of the following strong groups is fully 
conserved: STA, NEQK, NHQK, NDEQ. QHRK, MILV, MILF. HY, and FYW; and 
°." Indicates that one of the following weaker groups is fully conserved: CSA, ATV, SAG, 
STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, FVLIM. and HFY. 

Of particular relevance with respect to PKS domain function, the KS domain 
In the loading module (0RF5|KS1) contains a Gin (Q) in place of the active site Cys (C) 
residue (Figure 8) and that the KR domain of the first module of 0RF7 (0RF7[KR1 ) 
contains several amino acid substitutions in the key cofactor-binding motif (Figure 12). 
Figure 15 shows the high degree of overall homology between ethylmalonyl-CoA- 
specific AT domains from the tylosin PKS (TYLO) and the niddamycin PKS (NIDD) and 
the second AT domain of rosaramicin ORF 7. This high degree of homology is 
indicative of their shared substrate specificity. 

REGS and REGM are involved in regulation of gene expression. ABCC, a 
membrane transport protein and MTRA, a rRNA methyltransferase, are involved in 
resistance to and/or export of rosaramicin. The TESA gene product represents a free- 
standing thioesterase enzyme that is expected to play a "proofreading" role in the 
assembly of the rosaramicin core polyketide precursor. The OXRH gene product 
represents a crotonyl CoA reductase that is involved in the formation of the acyi-CoA 
precursor used by the loading module of ORF 5 and/or the second module of ORF 7. 
The step involving crotonyl CoA reductase, ie. the OXRH gene product, is expected to 
be a rate-limiting step in the biosynthesis of rosaramicin (Stassi D.L. etal., Proc Natl 
Acad Sci 95(13), 7305-9, June 23,1998; and it is expected that increasing the levels of 
the OXRH enzyme will have a beneficial effect on the yield of rosaramjcln. The NBPA 
gene product is a nucleotide binding protein (i.e., contains a GTP/ATP binding motif) 
and is expected to activate a sugar by tethering It to a nucleotide, usually TTP. 
Therefore, the NBPA gene product is expected to be Involved In the first step in the 
pathway leading to the fomnation of the deoxyamino sugar of rosaramicin. 
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Examole 4: Fermentation of Micronomospora carbonacea. auranti aca and detection of 
rosaramicin: 

Micmmonospora carbonacea aurantiaca NRRL 2997 was cultured on a 30ml 
media A plate (glucose 1.0%, dextrin 4.0%. sucrose 1 .5%, casein enzymatic 
hydrolysate 1.0%, MgS04 0.1%, CaCOa 0.2%. and agar 2.2g/100ml) at 30°C for 14 
days. The cells and agar were added to 25 ml of 95 % ethanol and incubated at room 
temperature for 2h under agitation. The ethanol phase was collected and the extraction 
step was repeated under the same conditions. The ethanol was evaporated from the 
pooled extracts and the residue was freeze-dried. The residue was then resuspended in 

10 1.0 ml of water. 

SPE of extracts: The C-18 solid phase column (Burdick & Jackson) was 
conditioned before use by sequential washing with 3ml of distilled water, 3 ml of 
methanol, and finally Z ml of distilled water. The residue previously resuspended in 1 .0 
ml of water was loaded on the conditioned solid phase extraction system (SPE), 
Following passage of the sample though the SPE column washes were perfomied first, 
with 5 ml of water to remove polar materials, and then with 70% acetone and 30% 
methanol to elute a secondary metabolite-containing fraction which was then freeze- 
dried. This organic fraction was dissolved in 300ul of 50% acetonitrile— distilled water. 
Chemical analysis : Chemical analysis of the organic fraction from the SPE 

20 column was performed by HPLC-ES-MS (Waters, ZQ systems). The extracts (50.0 ul) 
were separated on a C18 symmetry analytical column (2.1X150mm) with HPLC 2690 
system (Waters) using a 60-mln linear gradient from 30% acetonitrile-5mM ammonium 
acetate to 95% acetonitrile-5mM ammonium acetate at a flow rate of 1 50ul mm'\ UV 
and visible light absorption spectra (220 to 500nm) were acquired with a PDA (Waters) 
by using the column effluents prior to their analysis by ES-MS. The electrospray source 
was switched between positive Ion mode and negative Ion mode at 0.3 s intervals to 
acquire both positive and negative ion spectra. The cone voltage was 25.0 V. The 
capillary was maintained at 3.0 V. The source temperature was kept at 100°C. The 
desolvatlon temperature was kept at 400"C and the desolvation gas flow was 

30 479 litre.h"\ The data collection and analysis were perfomied with MassLynxy3.5 
program (Waters). 
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Ffgure 8 is a HPLC-ES-MS analysis of rosaramicin showing a UV spectra at a 
retention time of 24.4 minutes and a MS spectra showing a molecular Ion consistent 
vwth rosaramicin at retention time 24.4 minutes (mass of 582.57 [M+H]*). 

The present invention is not to be limited in scope by the specific 
embodiments described herein. Indeed, various modifications of the invention In 
addition to those described herein will become apparent to those skilled in the art from 
the foregoing description and the accompanying figures. Such modifications are 
intended to fall within the scope of the appended claims. 

It is further to be understood that all sizes and all molecular weight or mass 
values are approximate, and are provided for description. 

Patents, patent publications, procedures and publications cited throughout 
this application are Incorporated herein in their entirety for all purposes. 
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1 . An isolated, purified or enriched nucleic acid comprising a nucleic acid sequence 
selected from the group consisting of: , 

(a) a nucleic acid of SEQ io NOS: 3, 5, 7. 9, 11, 13, 15. 17. 19. 21. 23. 25. 
27. 29, 31.33. 35. 37. 39; 

(b) a nucleic acid encoding a polypeptide of SEQ ID NOS: 2, 4, 6. 8, 10, 12, 
14. 16, 18, 20, 22. 24. 26,'28. 30, 32. 34, 36, 38; 

(c) a nucleic acid having at least 75% homology to a nucleic acid of (a) or (b) 
10 as determined by analysis with BLASTN version 2.0 with the default 

parameters; 

(d) a nucleic acid complementary to a nucleic acid of (a), (b) or (c). 

2. An isolated, purified or enriched nucleic acid capable of hybridizing to a nucleic 
acid of claim 1 under conditions of high stringency. 

3. An isolated, purified or enriched nucleic acid capable of hybridizing to the nucleic 
acid of claim 1 under conditions of moderate stringency. 

20 4, An isolated, purified or enriched nucleic acid comprising the sequence of at least 
two nucleic acids of claim 1. 

5. An isolated, purified or enriched nucleic acid comprising the sequence of at least 
three nucleic acids of claim 1 . 



6. An isolated, purified or enriched nucleic acid comprising a nucleic acid that 
hybridizes under stringent conditions to any one of rosaramicin open reading frames 
(ORFs) 1 to 19 (SEQ ID NOS: 3. 5. 7. 9, 11. 13. 15, 17, 19. 21. 23, 25, 27. 29. 31. 33. 
35, 37, 39) and can substitute for the ORF to which it specifically hybridizes to direct the 

30 synthesis of a rosaramicin compound or analogue. 

7. An Isolated, purified or enriched nucleic acid that hybridizes under stringent 
conditions to any one of rosaramicin ORFs 1 , 2, 4, 5, 6, 7, 8. 9, 10. 12. 14 or 15 (SEQ 
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ID NOS: 3, 5. 9, 1 1 , 13, 15. 17. 19, 21 . 25, 29, and 31) and can substitute for the ORF 
to which it specifically hybridizes to direct the synthesis of a rosaramlcin compound or 
analogue. 

8. An isolated nucleic acid of claim 1 that hybridizes under stringent conditions to a 
nucleic add encoding a polypeptide selected from the group comprising SEQ ID NOS: 
2,4.6,8.10.12,14,16.18,20. 

9. The isolated nucleic acid of claim 1 that hybridizes under stringent conditions to a 
nucleic acid encoding a polypeptide selected from the group consisting of SEQ ID NOS: 
22, 24, 26. 28. 30. 32, 34, 36, 38. 

1 0. An Isolated gene cluster comprising ORFs encoding polypeptides sufficient to 
direct the synthesis of a rosaramicin compound or analogue. 

1 1 . The isolated gene cluster of claim 10 wherein the gene cluster Is present in a 
bacterium. 

12. The isolated gene cluster of claim 10 wherein the gene cluster contains a nucleic 
acid of any one of rosaramlcin ORFs 1 to 1 9 (SEQ ID NOS: 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 
19, 21, 23, 25, 27, 29, 31, 33, 35. 37. 39) present In the E. coll strains DH10B having 
accession nos. IDAC 100702-1, 100702-2 and 100702-3. 

1 3. An isolated polypeptide comprising a polypeptide sequence selected from any 
one of: 

(a) a polypeptide of any one of SEQ ID NOS: 2, 4. 6. 8. 10. 12. 14, 16. 18, 20. 
22, 24, 26, 28, 30, 32, 34, 36, 38; and 

(b) a polypeptide which is at least 75% identical in amino acid sequence to a 
polypeptide of anyone of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 
22, 24. 26. 28. 30, 32. 34, 36, 38 as determined by analysis with BLASTP 

' with the default parameters. 
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14. The isolated polypeptide of claim 13 wherein the polypeptide sequence selected 
from any one of: 

a) a polypeptide of any one of rosaramicin ORFs 1, 2, 4. 5. 6. 7, 8. 9, 10, 12, 
14 or 15 (SEQ ID NOS: 2, 4, 8, 10. 12, 14, 16. 18, 20, 24, 28 and 30); and 

b) a polypeptide which is at least 75% identical in amino acid sequence to a 
pelypeptide of any one of rosaramicin ORFs 1, 2. 4, 5, 6, 7, 8, 9, 10, 12, 
14 or 15 (SEQ ID NOS: 2, 4, 8, 10, 12, 14, 16. 18, 20, 24, 28 and 30) as 
determined by analysis with BLASTP with the default parameters. 



10 15. A polypeptide comprising at least two polypeptides of claim 1 4. 

16. A polypeptide comprising at least three polypeptides of claim 14. 

17. A polypeptide comprising at least five or more polypeptides of claim 14. 

1 8. An expression vector comprising a nucleic acid of claim 1 . 

19. A host cell transfomned with an expression vector of claim 18. 

20 20. The host cell of claim 19, wherein the cell is transformed with an exogenous 
nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct the 
assembly of a rosaramicin compound or analogue. 

21 . A method of chemically modifying a biological molecule that is a substrate for a 
polypeptide encoded by a rosaramicin biosynthesis gene cluster, said method 
comprising contacting the biological molecule with a polypeptide of claim 13, wherein 
said polypeptide chemically modifies said biological molecule. 



22. . The method of chemically modifying a biological molecule that is a substrate for a 
30 polypeptide encoded by a rosaramicin biosynthesis gene cluster, said method 

comprising contacting the biological molecule with at least two different polypeptides of 
claim 13. 
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23. An isolated or purified antibody capable of specifically binding to a polypeptide 
having a sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8, 10. 
12. 14, 16, 18, 20. 22, 24, 26. 28, 30, 32. 34. 36. 38. 

24. A method of making, a polypeptide having a sequence selected from the group 
consisting of SEQ ID NOS: 2, 4, 6. 8. 10, 12. 14, 16, 18. 20, 22. 24. 26. 28, 30, 32. 34. 
36, 38 comprising introducing a nucleic add encoding said polypeptide, said nucleic 
acid being operably linked to a promoter, into a host cell. 

10 25. A method of making a rosaramicin compound or analog comprising the step of 
providing a bacterium containing a gene cluster with sufficient genes to produce a 
rosaramicin compound of analogue and culturing the bacterium under conditions 
allowing for expression of the sufficient genes to produce a rosaramicin compound, 
wherein the gene cluster contains at least one nucleic acid of claim 1 . 

26. A method of making a rosaramicin compound or analog comprising culturing a 
Micromonospora carbonacea bacterium under conditions allowing for expression of 
rosaramicin ORFs 1 to 19 (SEQ ID NOS: 3. 5, 7, 9. 11, 13. 15. 17, 19. 21. 23, 25. 27. 
29, 31, 33, 35, 37, 39) present in the E. coli strains DH10B having accession nos. IDAC 

20 1 00702-1 , 1 00702-2 and 1 00702-3. 

27. A computer readable medium having stored thereon a sequence selected from 
the group consisting of a nucleic acid code of SEQ ID NOS: 3, 5, 7, 9, 1 1, 13, 15, 17, 
19, 21, 23, 25, 27, 29, 31 , 33, 35, 37, 39. 41, 43, 45, 47, 49, 51 and a polypeptide code 
of SEQ ID NOS: 2, 4, 6. 8. 10. 12, 14, 16, 18, 20, 22, 24, 26. 28, 30. 32, 34, 36, 38, 40. 
42. 44. 46. 48, 50. 

28. A computer system comprising a processor and a data storage device wherein 
said data storage device has stored thereon a sequence selected from the group 

30 consisting of a nucleic add code of SEQ ID NOS: 3, 5, 7, 9, 11 , 13, 1 5, 17, 19, 21 , 23. 

25. 27. 29, 31. 33, 35. 37. 39, 41, 43. 45, 47. 49. 51 and a polypeptide code of SEQ ID 
NOS: 2. 4, 6. 8, 10, 12, 14, 16. 18. 20. 22. 24, 26, 28, 30, 32, 34, 36. 38, 40, 42. 44. 46, 
48, 50. 
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Figure 8A 
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DIAVIGMSCRLPG-APSIEEFWDLLCSGRSAVDRQP-DGGWR 

PWWGMGCRFPGGWCAEGLWDLVLGGGDAVSGFPVDRGWDVEGLFDPVRGWGKSYVR 
PIAXVGMACRYPGGADTPEKLWDLLLAGADVIGPAPDDRGWDVDSFFDPVPGAAGKSYAR 
PIAIVOISCRLPGGVSTPEDLWRLVEAGTDAISGFPDDRGWDVGRLYDPDPDSTGTSYVR 
PVVWGMGCRFPGGWCAEGLWDLVLGGGDAVSGFPVDRGWDVEGLFDPVRGWGKSYVR 
PIAVIGMACRFPGGVDAPDDLWDLLAKGRDAISRFPTNRGWDVDGLYDPDPEAPGRTYVR 
PIAIVGMACRYPGGVGSPEELWELVASGTDAISPFPDDHGWDGDALYDPDPEAAGRTYCR 
PIAWGMACRYPGGVSSPEDLWRLVETGTDAIGGFPTDRGWDVDAVYDPDPESRNTTYCR 
••«• • ■ • •••• • 

— AVIDGKGESDAAFFGMSPRQAAAVDPQQRLMLELGWEALENARIRPADLKGSDTGVFV 
EGGFVYDAGMFDAEFFGVSPREAVAMDPQQRLFLEVSWEALERAGIDPLGLRGSRTGVYV 
EGGFVYDAGMFDAEFFGVSPREAVAMDPQQRLLLETSWEALERAGIDPAGLRGSRTGVYS 
EGGFLYDCAEFDPEFFTVSPREALAMDPQQRLLLEAAWETFERAGIAPDSARGTRTGVYV 
EGGFVYDAGMFDAEFFGVSPREAVAMDPQQRLFLEVSWEALERAGIDPLGLRGSRTGVYV 
EGGFLHDAPDFDAAFFGISPREALAMDPQQRLLLETTWESLERAGLDPTALRGTRTGVFV 
EGGFIiAGVGDFDAAFFGISPREALAMDPQQRLLLETSWEALERAGIPPDSLRGSRTGVCV 
EGGFLAGAGDFDAAFFGVSPHEAWMDPQQRLLLEVSWEALERSGTDPHSLRGSRTGVYV 

• ••• • »•»•• • •••• 

gltaddyatllrrsgtpisghtatglnrsltanrlsyllglrgpsftvdsaqssslvavh 
gvmgqeygprlvesgggfegylltgtspswsgrvsyvlglegpsisvdtaHssslvalh 
glthqeyaarlheapqelegylltgksvsvasgrvsyvlglegpsisvdta ■ ssslvalh 

GVMYDDYGSRLSEVPKDLEGYLVNGSAGSVASGRIAYTLGLQGPAVTVDTA • SSSLVALH 
GVMGQEYGPRLVESGGGFEGYLLTGTSPSWSGRVSYVLGLEGPSISVDTA ■ SSSLVALH 
GTNGQHYMPLLRDGADDFDGYLGTGNSASVMSGRLS YVFGLEGPAVTVDTA J SASLVALH 
GAWHGGYTDWGQPPAELEGHLLTGGWSFTSGRISYALGLEGPALTVDTA ■ SSSLVALH 
GAAHQGYAVDAGQVPEGAEGFRLTGSADAVLSGRISYLLGLEGPALTVETAgSSSLVAVH 
* * ,* :.*:;* :**.**:.:*::* *:****:* 

LACESLLRGESAVAWGGVSLILAEESTAAMARMGALSPDGRCFTFDARANGYVRGEGGV 
LACQGLRLGECDVALAGGVTVIAAPGLFVEFSRQGGLSGDGRCRAFAGGADGTGWGEGAG 
LACQGLRLGECDVALAGGVTVIAAPGLFVEFSRQGGLSGDGRCRAFAGGADGTGWGEGAG 
LAVQALRSGECELALAGGATVLATPTMFVDFARQRGLAEDGRCKAFADAADGTGFGEGVG 
LACQGLRLGECDVALAGGVTVIAAPGLFVEFSRQGGLSGDGRCRAFAGGADGTGWGEGAG 
LAVQALRRGECTLALVGGATVMSTPDMLVEFSRQRAMSPDGRSKAFAAAADGVALSEGAA 
LAVRALRQGECDLALAGGATVLASPAVFVQFSRQRGLAPDGRCKAFADSADGFGPAEGVG 
LAVQALRRGECGLALAGGVAVMPDPAAFVEFSRQRGLAADGRCRAFGAGADGTGWAEGVG 
** * ** ** ... . ***. :* *:* .** 
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VWLERLSVARERGHRVLAWRGSAVNQDGGSNGLTAPSGVAQRRVIGAALVAAGLGVSD 
MLLVERLSDAVRNRRQVLAWRGSAVNQDGASNGLTAPNGTAQQLVIRQALTNAGLAADE 
VWLERLSVARERGHRVLAWRGSAVNQDGGSNGLTAPSGVAQRRVIGAALVAAGLGVSD 
MMWQRLADAEAAGHEILAWKGSAVNQDGASNGLTAPNGPSQERVIRQALADAGLRPDQ 
MLWERLSDAVRHGRRVLALVTGTAVNQDGASNGLTAPSGPAQEKVLRQALVDARVTAAD 

VLVLQRLSDAVRDGRWVLGVIRGSAVNQDGASNGLTAPSGPAQQRVIRQALTDARLGADO 
.... * * ... * ***.** ** * :: * * : 

VDYVELfflGTGTKAGDPVEAAALGAVLGVARGCDNPLAVGSVKTNVGjSLEGAAGIT^ 
VDWEaHgTGTRLGDPIEAEALLGSYGRGRVGG-ALLLGSVKSNIG I TQAAAGVAGVIKM 
VDVVE7U|GTGTRLGDPIEAEALLGSYGRGRVGG-ALLLGSVKSNIg| 
VDAVEAgGTGTRLGDPIEAQALLATYGQGRPADRPLLLGSLKSNIGOTQAAAGVAG^ 

[G 

LG|r±^ 

VDAVEAgGTGTRLGDPIEVRALMNVYGAGRPADRPLWLGSLKSNIG I TQAAAGVGGVIKT 
IDAVEaBgTGTRLGDPIEAQALIAAYGADRTPDRPLWLGSLKSNIG I AQAAAGVGGLIKM 



VDA^; 



IGTGTRLGDPIEAEALLGSYGRGRVGG-ALLLGSVKSNIG I TQAAAGVAGVIKM 

itgtalgdpieaqallatygrdrpagrplwlgslksnigStqaaagiagvmkv 
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Figure 8B 
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Figure 9A 

0RF5 1 ATI -VVPVWSGRSVGaLRAYAGRLREVCAGLSDGGGSGGGSGLVDVGWSLVSSRSVPEHRAV 
0RF5 AT2 -WPVWSGRSTAALRAYAGRLREVCAGLSDG AGLVNVGWSLVSSRSVFEHRAV 



wo 03/010193 



PCT/CA02/01177 



Figure 9B 
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S SAHVEAVEGMLSGLLGGLCPGRGVVPFYSSWGGWIXSVGL-DGGYW^^ 

s sahveavegmlsgij^ggix:pgrgvvipfysswggvvix;vgl-dggywyri^ 

F SVRMDGMIJVEFEKAMGDLRAGEPTIPWANV^ 

q SAQVEVLKDHM^AALAFVSPRSSQIPFYSTVTGGIjLDTALL 

S SAHVEAVEGMLSGLLGGLCPGRGVVPFYSSWGGVVIX?7GL-DGGYWYRNLRERV^ 
S STQVDRLRAELLTVLGPVDARPAQVPFYSTVQGGRVDTAGL-DAGYWYRNLRGQVRFE 
F SPHVEAMLEPFRRVARGLTYHAPTIPWSNATGRLATADALRDPGYWVRHVRQPVRFR 
B* :::: : : * ***.*** 

DWGRLVGDGFSGFVECSGHPVLAGGVLESVA WDPDVRPVWG 

DWGRLVGDGFSGFVECSGHPVLAGGVLESVA WDPDVRPVWG 

DWGRLVGDGFSGFVECSGHPVLAGGVLESVA WDPDVRPVWG 

DGMRALRAEGVDTFVELGPDGVLTAMARDCLADPADPVDLADAAEPAGAAEPDRSLLFLP 

QATRAMLADGHEGFIiEPSPHPMLSVSLQGTAA DAG- -VAATVLG 

DWGRLVGDGFSGFVECSGHPVLAGGVLESVA WDPDVRPVWG 

BTVRVLLDDGHRAFVEAAAHAVLVPAIQELGD —SAG — VRWAVG 

DGVRAARDQGATAFVGLGPDGVIiCALAEECLG ' PTGDVLLLP 

•••••• : • 

SLRRDDGGWGRFDTSVGEAFVGGMSVDWKGVFAGAGARLVDLPTYPFQRRHYWAPN 
SLRRDDGGWGRFLTSVGEAFVGGMSVDWKGVFAGAGARLVDLPTYPFQRRHYWAPT 
SLRRDDGGWGRFLTSVGEAFVGGMSVDWKGVFAGAGARLVDLPTYPFQRRHYWAQT 
TLRRDRDDAVAVREALASVHVHGLPVDP-VAPLGDGPLATDLPTYPFQRSRYWL— 
TLRRGKGGARWFGMALGLAHAHGIEIDAS-VLFGTDSRRVDLPTYPFQRERFWYHP 
SLRRDDGGWGRFLTSVGEAFVGGMSVDWKGVFAGAGARLVDLPTYPFQRRHYWAPT 
SLRREAGGLDRLLASAAEAFTQGVAVDWSRALAGAARVAVDLPTYAFQRQRYWLEP 
VLRPGRPEPATLIAALAGAYAGGAEMDWSRVFAGTGARRVELPTYAFQHRRYWL-- 

*4r . * .* * .**** **. 
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Figure 10 



0RF5 
0RF6 
0RF7 



0RF5 
0RF6 
0RF7 



DH3 
DHl 
DH2 



DH3 
DHl 
DH2 




AGLSGADHPLLGGAVELPDRGGHVYPARLGVRHHPTOiGEgflALL e AAI 

AAAAGCPEVEELRLEAPLWPARGGVRLQVLVDDPDDGSDRRAVSVFS 
RAEVGCTRVAELTFEAPMVLADDGGVRVRVWDGPD-ADGARQVRIHS 
GRRDGAGRIEELTLDAPLWADESAAQLRLWGPAD-AEGRRQLTVHS 

***••**•*• ••••»* * 



ITAFVEIiALHA 

talleimhrv 
Sgaayaelalw;^ 
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Figure 11 



0RF7 I ER2 
DEBS ER_mod4 



QIiAViyVGAVHVPKLVRHRPRPDGPLTPPAGAAWRiy^GGQGTLEGr^ 

QLALRGDDVFVPRLS PLAPSALTLPAG-TQRIiVPG-DGAIDSVAFEPAPDVEQPLRA 

***:*.. *.**** * ** *** . * .*. **** 



0RF7 
DEBS 



ER2 GQVRVAVRAAGVNFRDTLIALGMYPGTP\rLGAEGAGVITEVAPDVAGFAPGDRV^ 
ERjmod4 GEVRVDVRATGVNFRDVLLALGMYPQKADMGTEAAGVVTAVGPDVDAFAPGD 

*.*** ***.*******.****** ^ * *** *********.. * 



0RF7 I ER2 
DEBS ER^inod4 



0RF7 I ER2 
DEBS ER_mod4 



0RF7 I ER2 
DEBS ER_mod4 



GLGPVAVADARMLARVPRGWSYAEAASVPAVFLTAHYAIjTRriAGIRPGQS: 
AFAPIAVTDHRLLARVPDGWSDADAAAVPIAYTTAHYALHDLAGLRAGQSVgi 




rlqlsrhlgvevyatasrgkwdtlrglglddahiadsrsldfagrflaatggrgvdv 
waiSrragaevlatagpakhgtlralglddehiassretgfarkfrertggrgvdv 



G^ 

**H: : *□* • * ** *** * *** ***** *** ** ** .* ******** 
HH H •••• 

VLNSLAGDFVDASLRLLPRGGHFIiELGKADVRDPDRIAADHPGVGYRAFDLVEAGPELVG 
YLNSLTGELLDESADLLAEDGVFVEMGKTDLRD AGDFRGR-YAPFDLGEAGDDRLG 

*****.*...* * ** * *.*.**. 4r . **. * * * * *** *** . .* 



0RF7 I ER2 
DEBS 1 ER_inod4 



QLLGELMELF AAGVLS PLPLTVRDVRRAREAFRLI SQAR 
EILREWGLLGAGELDRLPVSAWELGSAPAALQHMSRGR 

• • * **• •*• * 



..* *• ** * **, 
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Figure 12 



0RF5 


KR2 


0RF5 


KR3 


0RF6 


KRl 


0RF7 


KRl 


0RF7 


KR2 


0RF8 


KRl 


0RF9 


KRl 


0RF5 


KR2 


0RF5 


KR3 


0RF6 


KRl 


0RF7 


KRl 


0RF7 


KR2 


0RF8 


KRl 


0RF9 


KRl 


0RF5 


KR2 


0RF5 


KR3 


0RF6 


KRl 


0RF7 


KRl 


0RF7 


KR2 


0RF8 


KRl 


0RF9 


KRl 


0RF5 


KR2 


0RF5 


KR3 


0RF6 


KRl 


0RF7 


KRl 


0RF7 


KR2 


0RF8 


KRl 


0RF9 


KRl 



PRGNILVTG@Tg 
PDGTVLLTGA' 

AAGTVLVTi 
ARGTILWGDTgPVAAL: 
AYGTVLVTGgT STL@GA\ 
PRGTVLVTGJT 
PRGTVLVTGgT 



fiALe 
EALgAHl 



RWLARN-GAEHLVLTSRRGADAPGAAELEADLRALGVEVTM 
LiVTTRGARRLLLVSRSGPDAPDAGRLTEELTGLGAHVTL 

jRHLVRRHGVRRLLIiVGRRGPDAPGAAALTRELEELGASVRV - 
SgRLLGD-GAAHWLAG PAAASTVGLTGGADRVAL 

Irhlvarhgvrhlvltgrsgpaadgasalvdeltasgasvtv 

Srrlaag-gaahlvltsrrgadapgaaglvgelralgaevtv 
Irwlarn-gathlvltsrrggnapgvaalraelvtlgaevtv 

* * ... * * * • . 



aacdvadraalsdvla ahpptavfhtagvlhdgvidtlaaghidevfrpktaaalili 

aacdttdraaiiagvlggipaehpltawhvagvlddgavqaltpervdavlrpkvdaalh 
aacdvgdrgavtrllagvpaahpltawhsaglpddgvltaqtgervaavlrakadaavn 

idcdpsdrdalagllg ayrpttiwappavaltalaettpedfvaavaaktttavh 

vacdaadrvalrrlldgipaahpltawhaagvlddatitaltagqvdavlrpkadavin 
avcdvadraavaallaglpadaplsavfhtagvahsmpigetgltdvaevfagkvagarh 
vacdvadreavAgllagipraapltavfhaagvpqvtplhettpelfaqvcagkvagavh 

★****..* :::«. : 

LDELTQH--QELDAF\n:jFSSVTGVWGNGGQAAyAAA]SIASLDAIAERR^ 

LHELTAG--LPLAAFVLFSGAAGILGRPGQANYAAANTFLDALAQHRRARGLP6VSLAWG 

LHELTRH—LDLTAFVLFSSVAGTIGSAGQAGYAAANAFLDAFASWRQGQGLPATALAWG 

LDALAAEAELELDAFWFSSVSGTWGGAGHGGYAAGTARLDALVEERRARGLPATAIAWT 

LHELTRD--RELSAFVLFSSAAALFGSPGQGNYSAANGFVDAFAQYRRAQGLHAVSLAWG 

LDELTRG— HDLDAFVLYSSNAGVWGSSGQSAYGAANAALDALAERRRAAGLTATSVAWG 

LHELAG DLDAFVTFASAAGVWGSGGQCAYAAANAALDALAERRRAAGLPATSVAWG 

• • ■ « •••• 

LWGGGG-MAEGIG--- EQNLNR RGITALDPELGIAALQQ 

LWGLASDMTGHLG EQDLRR MRRSGIAPMTGEEGLALFDL 

— PliDGGMAAGLG TADVAR LRRSGLVPLGVDDALVLFDA 

PWADATTAAGGQAPDASAGGHEPDTRAGGPDRELLRRGGIiTPLDPGAALDVLRG 

LWADSSRMAGHLD QEGMRRR — ^ MARGGVLPLTTDQGLALFDA 

LWGSGG-MGEGDA EEYLSR RGLRPMPPERGVDALLA 

VWGGPG-MGAGAG EEYLRR RGVRAMPPAAALAALGR 
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Figure 13 



0RF5 


ACPI 


0RF5 


ACP2 


0RF5 


ACP3 


ORF6 


ACPI 


0RF7 


ACPI 


0RF7 


ACP2 


0RF8 


ACPI 


0RF9 


ACPI 



AWLGLDSAQAVDPERTFKEHGF 
AAVLRHETVDAVAPTRAFKDAGFDE 
AGVLAIiREAADVDPGRPFREVGFD 
AAVLGHGEAAMLSTQRAFRDAGFDg 
AAVLGHDEAEAADPDRAFRELGF' 
AAVLGHADPQAVDADRAFRELGFD 
AAVLGHPGPEHVGPDAAFREIGFD S 
AGVLGHDGADDVPADAEFSALGFD 
* ** * ** 



'AVEIiCNHLQRGTGLRVPASLVYNHPTPMAAARKLQ 
iTALELRNHIiNSTTGLSLPPTWFDHPTPSTLAKFLE 
iTAVELRNRLGSATGLRIiAPSLVFDHPTPSAVAEHLV 
iTAVDLRNRIiGAATGLSLPAAWFDHPTPAALAAYLR 
•AVDLRimiNAATGIiNLPASVVFDHPSARVLAAYLR 
.TAVELRNRLATASGLRLPATLVFDHPTPEALAEHLL 
rTAVDLAKRLRAAVGVPLSATLVFDHPTATAVAEHLA 
LAAVQLRRRLAEATGLSLSAPVLFDHRTPDALAAHLH 
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Figure 14 



0RF9 I Te 
DEBS Te 



TGAGGPMiVCCAGTAAASGPREFTAFAAALAGLRDVTVIiPQTGFLPGEPIiPAGIjDVriLDA 
DGPGEVWICCAGTAAISGPHEFTRIiAGALRGIAPVRAVPQPGYEEGEPLPSSMAAVAAV 
ie * ...*****★* ***;*** :*.** *; * .:**.*: *****..: 



0RF9 I Te 
DEBS I Te 



0RF9 
DEBS 



Te 
Te 



QADAVLAHCAGGPFVL^ 
QADAVIRTQGDKPFWi 



***. a* 



TVRLEARGADPAALVLMDIYTPAAPGAMGVW 
iMAYALATELLDRGHPPRGWLIDVYPPGHQDAMNAW 
*H* **.**- * ** * .**•*.* * .**. * 



REEjyOijAWAERSWPVDDTRLTAMGAyHRLLLDWAPRPTRAPVLHLYAGEPAGAWPDPRQ 
LEELTATLFDRETVRMDDTRLTALGAYDRLTGQWRPRETGLPTLLVSAGEPMGPWPD — D 

* . .* * ** .* ** * * * . **** * *** . 

•••••• • 



0RF9 I Te 
DEBS Te 



DWRSRFDGAHTSAEVPgTgFSMMTEHAPWAATVHKWLD^ 
SWKPTWPFEHDTVAVI^DfflFTMVQEHADAIARHIDAWLGGGNS 
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Figure 15 



0RF7 
TYLO 
NIDD 



0RF7 
TYLO 
NIDD 



0RF7 
TYLO 
NIDD 



WPLFLSARGSAALCAQAARLRAiUiIEEPDLDIAEVGYTLAATRARFEHRAVVIGES^ 
AAB66506 | AT2 TVPLLLSGHTEAALREQSTRLLNDLLEHPDBHPADVGYTLITGRAHFGHRAAVIGESREE 
AAC46026 ) AT2 TWLMLSAHSEAALREQARRLCAQLLARPEQRPADVGHALLSTRARFPRRAAW^ 

****.**_. *** *. ** *. . **.* .**★.*** * 

^T2 VGDALAALARGEEHPSLLRG— RAGASDRVAFVFPGQGSQWAEMADGLLDRSPAFRASAS 

AAB6 65 0 6 | AT2 LLDALKALAEGREHHTVVRGDGTAHPDRRVVP^PGQGSQWPSM^ 
AAC46026 1 AT2 LAEALDAVAEGGPHPLAATG--TAGTADRVVTVFPGQGSQWAGMAEGLLERSGAFRSAAD 
: :** *:*.* * * * ** ********** ** **.*. *** .* 

AT2 ACDEALRAHLDWSVLDVLRRVPDAPALSRVDWQPVLFTMMVSLAAAWRALGVHPSAW 
AAB6 65 0 6 | AT2 ACDAALSVHLDWSVLDVLQEKPDAPPLSRVDWQPVLFTMMLSLAACWRDLGVHPAAWG 
AAC4 602 6 1 AT2 SCDAALRPYLGWSVLSVLRGEPDAPSLDRVDWQPVLFT^IMVSLAAVWRALGVEPAAW 
.** ** .*★*****. ******************.**** ** ****.**** 

^T2 HSQGEIAAAHVAGGLSLDDAARIVALRSQAWLRLAGQGGMVAVSLPVDALRARLARFGDR 
AAB6 65 0 6 | AT2 HSQGEIAAACVAGALSLEDAARrVALRSRAWLTLAGKGGMAAVSLPEARLRERIERFGQR 
AAC4 602 6 ( AT2 HSQGEIAAAHVAGALSLDDSARIVALRSRAWLGLAGKGGWAVPMPAEELRPRLVTWGDR 
********* ******.*.********.*** ***.*****.* ** *. :*.* 

AT2 LSVAAVNSPGTAAVSGYPDALAELVDELTAEGVHAKAIPGVDTAGHSAQVEVLKDHL^^ 
AAB66506 I AT2 LSVAAVNSPGTAAVAGDVDALRELLAELTAEGIRAKPIPGVDTAGHSAQVDGLKEHLFEV 
AAC46026 I AT2 LAVAAVNSPGSCAVAGDPEALAELVALLTGEGVHARPIPGVDTAGHSPQVDALRAHLLEV 
*.********.**.* .** ****..*.************. 

0RF7 I AT2 LAFV'SPRSSQIPFYSTVTGGLLDTALLDAAYWYRNMRDPVEFEQATRAiyn^ 

TYLO I AAB66506 I AT2 LAPVSPRSSDIPFYSTVTGAPLDTERLDAGYWYRJMIEPVEFEKAVRALIADGYDLFLEC 

NIDD I AAC4 6 0 2 6 I AT2 LAPVAPRPADI PF YSTVTGGLLDGTELDATYWYRNMREPVEFERATRALIAD^^ 

****.**. .*********^ *******.*****.***,.***., *** 

AT2 SPHPMLSVSLQGTAADAGVAATVLGTLRRGKGGARWFGMALGLAHAHGIEIDASVLFGTD 
AAB66506 | AT2 NPHPiyff^SLDETLTDSGGHGTVMHTLRRQKGSAKDFGMALCLAYVNGLEIDGEALFGPD 
AAC4 6 0 2 6 I AT2 SPHPMLAVALEQTVTDAGTDAAVLGTLRRRHGGPRALALAVCRAFAHGVEVDPEAVF6PG 
.*****:::*: * :*:* .:*: **** :*. • : * .*.*.* .** 



0RF7 
TYLO 
NIDD 



.0RF7 
TYLO 
NIDD 



0RF7 
TYLO 
NIDD 



0RF7 
TYLO 
NIDD 



AT2 SRRVDLPTYPFQRERFWYHP 
AAB66506 | AT2 SRRVNPPTYPFQRERYWYHP 
AAC4 602 6 | AT2 ARPVELPTYPFQRERYWCHP 
-* *. *********.* ** 
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SEQUENCE LISTING 

<110> ECX>PIA BIOSCIENCES INC. 
Famet, Chris 
Yang, Xianshu 
Staff a, Alfredo 

<120> GENES AND PROTEINS FOR THE BIOSYNTHESIS OF ROSARAMICIN 
<130> 3016-3PCT ■ 
<160> 39. 

<170> PatentIn version 3.0 



<210> 1 
<211> 60196 
<212> DNA 

< 2 1 3 > mi cromonospor a 


carbonacea 


subspecies 


aurcufitiaca 






<400> 1 
gtgccagttc 


cgacacagga 


ggcccccttg 


cggaacagcc 


cgccgccagc 


ccattcgcag 


60 


ctcgtcctga 


gcgaggtcac 


gaagcactac 


gccgagcggg 


tcgtcctgga 


ccgcgtttcg 


120 


ctcaccgtca 


agccggggga 


gcgggtcggc 


gtcatcggcg 


agaacgggtc 


ggggaagtcg 


180 


accctgctgc 


ggctcgtcgc 


ggggctggag 


acgccggaca 


acggcgagtt 


gaccgtctcg 


240 


gcgcccgggg 


gcatcggcta 


tctcgcccag 


cggcttcggc 


tgccggccgg 


cggcagcacc 


300 


gtacgggatg 


tggtggacca 


cacgctcgcc 


gacctgcgag 


acctggaggc 


gcggttgcgc 


360 


gccgccgagg 


cggacctggc 


caccgccacg 


cccgagcagt 


tggacgcct'a 


cggcacgctg 


420 


ctcactgtgt 


tcgaggcccg 


cggcggctac 


caggccgacg 


cccgggtgga 


cgccgccctg 


480 


cacggtctcg 


gcctggccga 


gctcgaccgc 


gatcgcgacg 


tcgacacgct 


ctccggcggg 


540 


gaacggtccc 


ggctcgcgct 


cgccgcgacc 


ctggccgccg 


cgccggaact 


gctgctgctc 


600 


gacgagccca 


ccaacgacct 


cgacatcgag 


gccgtggagt 


ggctggagga 


tcacctgcgg 


660 


tcgcaccggg 


gcaccgtcgt 


cgtggtcact 


cacgaccggg 


tgttcctgga 


gtcggtcacg ' 


720 


tccaccatcc 


tcgaggtcga 


caccgacacc 


cgggccgtgc 


accggtacgg 


cgacggctat 


780 


gccagctacc 


tgcgggccaa 


ggccgccctc 


cgggagagcc 


gggagcgcgc 


gtacgcggaa 


840 


tgggtggccg 


aggtcgagcg 


gcagtcccaa 


ctcgcggagc 


gggccgggac 


gatgctccgg 


900 


tcgatctccc 


gcaagggacc 


ggctgcgttc 


agcggggccg 


gtgcccaccg 


ctcccggtcg 


960 


tcgtcgacgg 


cgacgtcacg 


caaggcccgc 


aacgccaacg 


agcggcttcg 


ccggctgcgg 


1020 


gagaatccgg 


taccgcgacc 


cgccgacccg 


ttgcgcttca 


ccgcgtcggt 


cgccccggat 


1080 
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gccacggacg ccgatacccg 


ccgcgtcgag 


ttgaccgacg 


tccgggtggg 


ccgccgcctg 


1140 


cacgtgcccg agctgaccat 


cggacccgcc 


gaacggttgc 


tggtgaccgg 


acccaacggc 


1200 


gcgggtaaga gcaccctgat 


gcgggtgctc 


gccggggaac 


tcgtgcccga 


cggcggaacg 


1260 


gtgcggctgc cggctcggat 


cggccacctg 


cgtcaggacg 


tgacggtcgg 


gcagcccggg 


1320 


cgctctctgc tggagacgta 


cgcgtcgggt 


cggccggggc 


atcccgagga 


gtacgcggag 


1380 


gagttgctcg cccgcggtct 


gttccggccc 


gatgacctgc 


gcatgccggt 


cgggacgctc 


1440 


tccgtcgggc agcgccgccg 


gatcgacctg 


gcccggctgg 


tcgcccgccc 


ggccgacctg 


1500 


ctgctgttgg acgagcccac 


caaccacttc 


gcgcccctgc 


tcgtggagga 


gctggaacag 


1560 


gcgctggacg gctacgccgg 


agcgctggtc 


gtggtgacgc 


acgaccggcg 


gatgcggagc 


1620 


accttcaccg gggctcggct 


ggaactgcac 


cagggcgtgg 


ccaccggggc 


gagccgggcc 


1680 


tgacgagccg cccggggtgc 


cgtggcgcgc 


ccgggacggt 


ggggatctca 


gccggggtcg 


1740 


gcacccggca ccgccgtgag 


ggcggtcgtg 


gacaccgctg 


cgagggtggt 


cgtgacctcg 


1800 


gcgcacacag cgtcgagctg 


atcgttgaga 


tagaagtgcc 


cgcccgggaa 


cgtgcggacc 


1860 


atcgtggccg ctgcggtcac 


ctcggcccac 


gccgcggcct 


cgtcggtggt 


gacgtgggtg 


1920 


tcggcggccc cggcgagtac 


ggtgaccggg 


caacgcagcc 


tgggccctgg 


ccggtattcg 


1980 


taggcggcgg cggcccggta 


gtcgttgcgg 


atggcgggga 


ggagcatgtc 


cagcagttcc 


2040 


ctgtcgtcca ggaggctgga 


atcggtgccc 


tggagccggc 


ggatctcgtc 


gatcagctcg 


2100 


tcgtcaaacc ggtagaaccg 


gtcccgccgc 


ccgacggacg 


ggctacggcg 


gccggaggcg 


2160 


aagaggtgca cgagccgatc 


ggcgtcggcc 


ggtgggagcc 


ggcgggcggc 


ctcgaaggcc 


2220 


accgtggcgc ccatgctgtg 


accgaagaag 


gccaccggtc 


ggtccgccca 


ggcgagcagt 


2280 


gcgggcagga gcccgtccac 


cagggcgtcg 


acggactcga 


tcaagggttc 


gccgcggcgg 


2340 


tcctgccggc ccgggtactg 


gaccgccagc 


acgtccacgt 


cggcggcgag 


ccggcgggcg 


2400 


aacggcaggt acgcgctggc 


cgcgcccccg 


gcgtgcggga 


agcagaacag 


ccggacggcg 


2460 


gggtcgttga cgggccggta 


gcggcgtagc 


cacagctcgg 


acggatcggc 


ggacggggac 


2520 


atggtgatct gcgctcctcg 


gtctgctcga 


cgttccggtg 


tcggtcccca 


cccccgcgcc 


2580 


gaagacggcc atgatgtcgc 


gcacggcggc 


cgtcaccggc 


tcgacgtctt 


acttcgggtg 


2640 


ccgtccgtcg cgtaccacct 


ggacggggag 


gcgtcgcgcg 


gtgagctggt 


cggcgtcgta 


2700 


gaactcgacc ccgacgtggt 


cgatccggaa 


ctcggtgaac 


tggtcgagcg 


tctggttgag 


2760 
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gaagaccttc 


gcctccagcc 


tggccaggaa 


cgcgcccagg 


cagtggtgga 


tgccgtggcc 


2820 


gaacgccagg 


tgcttgttcg 


actcgcgtcg 


gatgtcgaag 


gtgtccgggt 


ccgtgaacac 


2880 


ctcggtgtcg 


cggttcgcgg 


aggcgatcca 


ggcgatcacc 


atctggccct 


tgcgcatggg 


2940 


gtggccgagg 


atgtcggtgt 


cctcgttcag 


gatccggaag 


atgcagttga 


acggggaccg 


3000 


gtagcgcagc 


gtctcctcga 


tcacgcccgg 


cacgaggctg 


cggtcggcgc 


ggaccgcggc 


3060 


ctgtgcctgc 


gggtgctcct 


ccagcaccag 


gaacaggttg 


ctgagcagcg 


tggcgctgga 


3120 


gatgtgcccg 


gcggtgagca 


gcagcgcgac 


gatgttgacg 


acttcctcgt 


cggtcagctt 


3180 


gcgcccgtcg 


acctccgccg 


cacagaggcc 


gctgatcagg 


tcgtccttcg 


gttcggcgcg 


3240 


cttgtgggcg 


atctgggcgt 


acaggaattc 


ggaccactcc 


tcgatggcgg 


ggcccaccgt 


3300 


ctcggtgaag 


tcgtccggga 


ggttgggata 


ctccagccct 


tcgttgctga 


ggatgatgtc 


3360 


cacccactcg 


cggaacttct 


cgtgatcctt 


ggtgggaatg 


ccgagcagct 


cggcgatgac 


3420 


cgtcaccggc 


agcgggtacg 


cgaggtcgct 


ggcgatgtcg 


atccggtcct 


ggtcgcgtac 


3480 


ctggtcgagc 


acgtcggcgg 


tgatctgccc 


gatccgcagc 


tccatctggg 


cgatccggcg 


3540 


gggggtgaac 


gcctggctca 


ccagcttgcg 


cagcggcgcg 


tgccgcggcg 


ggtcgatgcc 


3600 


gccgatggtg 


ccggggccca 


tcagcagggc 


cagctccgac 


ggtacgggaa 


agaccgaggt 


3660 


gaagtccgac 


gagaagatca 


gcgggttggt 


ggtcacggtc 


tggtagtccc 


ggtaggagaa 


3720 


cacgtgccag 


gcctgacggg 


tctcgtccca 


ggagacgggc 


cagttcttcc 


gcatgtacgc 


3780 


gaaccagtcc 


agcagcccct 


gggcgtcggc 


gcccttgggc 


aggtcgatcg 


gtcccgccgg 


3840 


ggcgttcggg 


gtctgcgtca 


tggtgtgctc 


atctcctcgg 


tggtctcggc 


cgtcgggccg 


3900 


aagggaaaga 


gaaccttggt 


tcgcgagggc 


gtccggtcgg 


ggaggggatc 


ttccgggctg 


3960 


gcgctgtcac 


ctgcggcctg 


ctcggtcgcc 


tcgccggcat 


tgacggttgt 


gctgggcggc 


4020 


gagtcagcgc 


tgtggcggcg ggcagggcgg 


gccctgcact 


tctccggggc 


gtcgtaatct 


4080 


tcggtccgaa 


tcgtgatggc cgcaaggccg 


gacctgacat 


agtgctgtct 


gcaacgctcg 


4140 


gagcacccgt 


tttatcagtt gattgcggtc 


atttttgtcg 


acgatcaggg 


cggttctata 


4200 


tcgagacttg 


acatagtctt ctacggattc 


gtgacaatga 


tcatcgatcg 


gtgttggctg 


4260 


aatcgacgaa 


aggggcgtgc tgttcgaggg 


ggcgttgcca 


agatcaatgc 


aaaaccgcat 


4320 


ccttgatcaa 


tgcggaaccg caccctgcct 


ggaagagagc 


tgccatggag 


catccagtaa 


4380 


cggccgggtc 


ctgcaggttc taccccttca 


gtgaccgtac 


cgacctgaat 


atcgatccca 


4440 


cgtacggcga 


actgcgctcg aaagagccgg 


tcgcccgcgt 


ccgcatgccc 


tacggcgggg 


4500 
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acgcctggct ggtcacccgg cacgccgacg ccaagaaggc cctctctgac ccccgactca 4560 

gcattgcagc cggagccggg cgggacgtgc cgcgcgcctc cccccgtctc caggaacccg 4620 

acggtctgat gggtcttccc cccgacgcgc acgcccgact gcgcaggctc gtcgccacgg 4680 

cgttcacgcc gaagcgcgta cgggacatcg ccccgcgcgt cgtccagctc gccgacaagc 4740 

ttctcgacga cgtggtcgaa accgggccgc cggccgacct cgtgcagcag ctcgcgcttc 4800 

ccctgccggt gatgatcatc tgcgagatga tgggcatcgg gtacgacgag cagcacctgt 4860 

tccgtgcctt cagcgatgcc ctgatgtcct ccacccgata cacggccgac caggtcgacc 4920 

gcgcggtaga ggacttcgtc gagtacctcg gcggcctcct cgcgcagcgc cgtgcacacc 4980 

gcaccgacga cctcctcggc gccctggtcg aggcgcgaga cgacggcgat cggctgaccg 5040 

aggacgaact cgtcatgctc accggcggcc tgctcgtcgg cggccacgag acgaccgcca. 5100 

gccagatcgc ctcgcagatc ttcctcctgc tgcgcgaccg gaccaggtac gagcaactcc 5160 

atgcccgtcc ggagttgatc cccacggcag tcgaggaact gctgcgggtg gccccgctct 5220 

gggcctcggt cggccccacc cgcatcgcca ccgaggacct ggaactcaac gggacgacca 5280 

tccgggccgg cgacgccgtc gtcttctcgc tggcgtccgc caatcaggac gacgacgtct 5340 

tcgcgaatgc cgcagacgtc gtgctcgacc gcgacccgaa tccgcacatc gccttcgggc 5400 

acgggcccca ttactgcatc ggggcgtcac tggccagact ggaaatacag gccgccatcg 5460 

gcgccttggc caggcggctt cccggtctcc gcctggccgt cgaggaaaac gaacttgatt 5520 

ggaacaaggg aatgatggta cgcagcctcg tgtcccttcc ggtgacgtgg tgacccggcc 5580 

cgggcgccgg atcaggtgac gaacggatca gtagcgcatt ggctcggccg gcccggagct 5640 

gacatggcct ggtgaagccg aaaaccatcg gccgccgccg ctgcaagcgc ccgctgggac 5700 

gatgcgagtt gtgggcgcag acgcgtgcag cgcagccgtc cccgccggac cgcggatggg 5760 

cttcccagca tcgttcttcg acccaggaga cctcatgacc gtgcagagtg acgtgttgcg 5820 

ccaccgcgat atcgccgtca tcgggatgtc ctgccggctt cccggcgcgc cgagcatcga 5880 

ggaattctgg gacctgctgt gcagcgggcg gagcgcggtc gaccgccagc ccgacggcgg 5940 

ttggcgggcg gtgatcgatg ggaagggaga atccgacgcc gcgttcttcg gcatgtcccc. 6000 

gcgccaggcc gccgcggtcg acccgcaaca gcgcctgatg ctcgaactcg gctgggaggc 6060 

actggagaac gcccgcatcc ggcccgccga cctgaagggc tccgacactg gcgtcttcgt 6120 

ggggctcacc gccgacgact acgccacctt gctgcgccgc tccggcacgc ccatcagcgg 6180 



- 4 - 



wo 03/010193 PCT/CA02/01177 

gcacaccgcg acaggcctga accgtagcct cacggccaac cgtctctcgt acctgctggg 6240 
tctgcgcggc cccagcttca ccgtggactc cgcgcagtcg tcatccctgg tcgccgttca 6300 
cctggcgtgc gaaagcctgc tgcggggcga gagcgcggtc gccgtcgtcg gcggggtgag ' 6360 
cctcatcctg gcagaggaga gcaccgccgc catggcgcgt atgggggcac tctctcctga 6420 
cgggcgttgc ttcaccttcg acgcccgggc caacggctac gtccgtggcg agggtggcgt 6480 
ggccatggtc ctcaagccgc tgatccgcgc gatcgaggac ggcgaccagg tgcactgcgt 6540 
catccggggc tgtgccgtca acaacgacgg cggtggcccc agcctcaccc atcccgaccg 6600 
ggaggcccag gaggcattgc tgcgccgggc gtacgagcgg gcgggggtgg cccccgaaca 6660 
cgtcgactac gtcgagctgc acggcaccgg gacgaaggcc ggcgaccccg tcgaggcggc 6720 
ggccctcggg gcggtgctgg gtgtcgcccg cggctgcgac aacccactcg cggtcggatc 6780 
ggtcaagacc aacgtcggcc acctggaggg ggcggccggc atcacgggcc tgctgaaggc 6840 
ggtgctgtgc gtacgtgagg gggtgctgcc gccgagcctc aacttccgta cgccgaaccc 6900 

ggacatccgc ctcgacgagc tgaacctccg ggttcagacg gaactgcagc cgtggccggg 6960 

cgacgggacg ggccgcccgc gtgtcgccgg agtgagttcc ttcggcatgg gcggtacgaa 7020 

tgcgcatctg attctcgagc aggctccggt ggcggctgag gaaacggctg ttaccgatgc 7080 

cggtgtcggt tcggttcggg tggttccggt ggtggtgtcg ggtcgttcgg tgggggcttt 7140 

gcgggcgtat*gcgggtcggt tgcgtgaggt gtgcgcgggg ttgtctgacg gtggtggctc 7200 

cggtggtggt tctggtctgg tggatgtggg ttggtcgttg gtgtcgtcgc ggtcggtgtt 7260 

cgagcatcgg gcggtcgtgt tcggtggggg tgtcgccgag gtggtggcgg gtttggatgc 7320 

ggtggcttct ggggcggtga gttcgggttc ggtggtggtg ggttcggtgg cgtcgggtgt 7380 

tgctggtggt ggtggtcggg tggtgtttgt gtttccgggt cagggttggc agtgggtggg 7440 

tatgggtgcg gctctgttgg acgagtcgga ggtgtttgct gagtcgatgg tggagtgtgg 7500 

gcgggcgttg tcggggtttg tggattggga tttgttggaa gtggtccgcg gtggtggggg 7560 

tgacggatcg tttggtcggg ttgatgtggt gcagccggtg tcgtgggcgg tgatggtgtc 7620 

gttggcgcgg ttgtggatgt cggtgggtgt ggtgccggat gcggtggtgg gtcattcgca 7680 

gggtgaggtt gctgcgccgg tggtgggggg tgtgttgagt gtggctgatg gggcgcgggt 7740 

ggtggcgttg cggtcgcggg tgatcggtga ggtgttggcg ggtggtggtg cgatggtgtc 7800 

ggtggggttg ccggtggcgg ttgtgttgga tcggttggcg gggtggggtg gtcggttggg 7860 

tgtggcggcg gtgaatggtc cgtcgttgac ggtggtgtcg ggggatgtgg atgctgctgt 7920 
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ggggtttgtt ggtgagtgtg agcgggatgg ggtgtgggtg cggcgggtgg cggtggatta 7980 

tgcgtcgcat tcggcgcatg tggaggcggt ggaggggatg ctgtcggggt tgttgggtgg 8040 

tttgtgtccg gggcggggtg tggtgccgtt ttattcgtcg gtggtgggtg gtgtggttga * 8100 

tggggtgggt ttggatggtg ggtattggta tcggaatctg cgtgagcggg tgttgttttc 8160 

ggatgtggtg gggcggcttg ttggggatgg gttttcgggg tttgtggagt gttcggggca 8220 

tccggtgttg gcgggtgggg tgttggagtc ggtggcggtg gtggatccgg atgtgcggcc 8280 

ggtggtggtg gggtcgctgc gccgtgatga tggtgggtgg ggccggtttt tgacgtcggt 8340 

gggtgaggcg ttcgtcggcg ggatgagtgt tgactggaag ggtgtgttcg cgggggcggg 8400 

cgcgcggttg gttgacctgc cgacgtatcc gttccaacga cgccactact gggcaccgaa 8460 

caccgacggc gcgccagctc cgatcctcga tgatcacgcg gaggcggaga acgaaccagc 8520 

cgaatccgag ccagggattc gggccgagct tctgacgttg gccgagcccg agcaactgaa 8580 

ccgactcttg gcgaccgttc gcgccagcac cgccgtcgtt ctgggcctcg actcggcgca 8640 

ggcggtcgat ccggagcgca cgttcaagga gcatggattc gaatcggtca ccgccgtcga 8700 
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gttgtcggtg gcgcgggagc gtggtcatcg ggtgttggcg gtggtgcggg gttctgcggt 9660 
gaatcaggat ggtgggtcga atggtttgac ggcgccgtcg ggggtggcgc agcgtcgggt 9720 
gattggtgcg gcgttggtgg cggcgggttt gggtgtgtcg gatgtggatg tggtggaggc 9780 
gcatgggacg gggactcggt tgggtgatcc gattgaggct gaggcgttgt tggggtcgta ' 9840 
tgggcggggt cgtgtgggtg gggcgttgtt gttgggttcg gtgaagtcga atattggtca 9900 
tacgcaggcg gctgcgggtg tggcgggtgt gatcaagatg gtgatggcgt tgcgggcggg 9960 
ggtggtgccg gcgacgttgc atgtggatgt gccgtcgccg ttggtggatt ggtcttcggg 10020 
tggggtggag ttggtgacgg aggcgcggga ttggccggtg gtgggtcgtg tgcgtcgtgc 10080 
gggtgtgtcg gcgtttgggg tgtcggggac gaatgcgcat ctgattttgg agcaggcccc 10140 
cgaattcgac gatccggttg ttaccgacac cgacaccgat gctggtgtgg gtaggggtct 10200 
atcggtggtt ccggtggtgg tttcgggtcg ttcgacggcg gctttgcgcg cttatgcggg 10260 
ccggttgcgt gaggtgtgcg cgggtctttc cgatggtgcc ggtctggtga atgtgggttg 10320 
gtcgttggtg tcgtcgcggt cggtgttcga gcatcgggcg gtcgtgtttg gtgggggtgt 10380 
cgccgaggtg gtggcgggtt tggatgcggt ggtttccggg gcggtggctt cgggttcggt 10440 
ggtggtgggt tcggtggcgt cgggtgttgc tggtggtggt ggtcgggtgg tgtttgtgtt 10500 
tccgggtcag ggttggcagt gggtgggtat gggtgcggcg ctgctggacg agtcggaggt * 10560 
gtttgctgag tcgatggtgg agtgtggtcg ggcgttgtcg gggtttgtgg attgggattt 10620 
gttggaggtg gtgcggggtg gggcgggtga gggggtgtgg ggtcgggttg atgtggtgca 10680 
gccggtgtcg tgggcggtga tggtgtcgtt ggcgcggttg tggatgtcgg tgggtgtggt 10740 
gccggatgcg gtggtgggtc attcgcaggg tgaggttgct gcggcggtgg tggggggtgt 10800 
gttgagtgtg gctgatgggg cgcgggtggt ggcgttgcgg tcgcgggtaa ttggtgaggt 10860 
gttggccggt ggtggtgcga tggtgtcggt cggactgccg atcgtggatg cgcaggaacg 10920 
gttggcgggg tggggtggtc ggttgggtgt ggcggcggtg aatggtccgt cgttgacggt 10980 
ggtgtcgggg gatgtggatg ctgctgtggg gtttgttggt gagtgtgagc gggatggggt 11040 
gtgggtgcgg cgggtggcgg tggattatgc gtcgcattcg gcgcatgtgg aggcggtgga 11100 
ggggatgctg tcggggttgt tgggtggttt gtgtccgggg cggggtgtgg tgccgtttta 11160 
ttcgtcggtg gtgggtggtg tggttgatgg ggtgggtttg gatggtgggt attggtatcg 11220 
gaatctgcgt gagcgggtgt tgttttcgga tgtggtgggg cggcttgttg gggatgggtt 11280 
ttcggggttt gtggagtgtt cggggcatcc ggtgttggcg ggtggggtgt tggagtcggt 11340 
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ggcggtggtg gatccggatg tgcggccggt ggtggtgggg tcgctgcgcc gtgatgatgg 11400 

tgggtggggc cggtttttga cgtcggtggg tgaggcgttc gtcggcggga tgagtgttga 11460 

ctggaagggt gtgttcgcgg gggcgggcgc gcggttggtt gacctgccga cgtatccgtt 11520 

ccaacgccgc cactactggg caccgactcc caccaacccc gccaccaacc ccgccacggg 11580 

cgacaccacc accgccgacc cggtgggtgg. cgtgcggtat cggatcacct ggaaaccgtt 11640 

gccgacggac gacccccgac ccctcaccaa ccgctggcta ctcatcgccg acccggggac 11700 

cgccggctcg gagcttgccg cagacatcac agcagcgctc attcgcaggg gcgccgaggt 11760 

cgagttgctg gccgtggacc cgctcgcggg ccgggcccgg atcgccgaac tgctcgccac 11820 

cacgacggct gggccggtgc cgctgtcggg cgccgtgtct cttctcgggc ttgtgcagga 11880 

cgcgcatcct caacacccct ccatcggaat gggcgtggtc tcgtcgctgg cgctggtgca 11940 

ggccatcggt gacgcgggag ccgagactcc tttgtggagc gtcacgcagg gggcggtcgc 12000 

tgtggtgccc caggaggcgc cggatgtgtt cggtgcgcag gtgtgggcgt tcgggcgggt 12060 

ggccgccctg gaactgccgg accgctgggg cggcctggtc gaccttccgt ccgtaccgaa 12120 

tgcccggatg ctggaccagc tcgccaacgc cctcgccgga gcggacggcg aggaccagat 12180 

cgcggtacgc ggctcgggga tctacgggcg tcgggtgacg cgcgcggcgg gcactgcgcg 12240 

' ccgggaatgg cgccctcgcg ggaacatcct ggtgaccgga ggtacgggaa gtctgggtgg 12300 

ccgggtggcc cggtggctcg ctcgcaacgg tgccgaacac ctcgttctca ccagtcgtcg 12360 

gggtgccgac gccccggggg cggcagaact ggaagctgat cttcgcgcgc tcggtgtcga- 12420 

ggtgaccatg gccgcctgcg atgtagcgga ccgggctgcg ctgtccgacg tcctggcggc 12480 

gcatccgccc actgcggtct tccacaccgc cggagtcctg cacgacggtg tgatcgacac 12540 

gctcgccgcc ggacacatcg acgaggtctt ccgtccgaag accgctgccg cgctgctgct 12600 

cgacgaactc acccagcacc aggagctgga cgccttcgtc ctcttctcat cggttaccgg 12660 

agtctggggc aacggcggcc aggcggcgta cgcggcggcg aacgcatcgc tggacgccct 12720 

ggcggagcga cgtcgtgccg caggtcttcc cgccacctcc atagcttggg gactgtgggg 12780 

cggcggtggc atggcggagg ggatcggcga gcagaacctg aaccgccgtg gcatcacggc 12840 

cttggacGcg gagctcggca tcgccgctct gcagcaggcc ctcgaccgcg atgacgtgtc 12900 

tgtcaccgtc gccgacgtcg actggacggt tttcgctccg cgtcttgccg acctgcgctc 12960 

ggggcggctc ttcgacgggg tgcccgaggc caggagcgcg ctcgatgccc ggaaagtgga 13020 
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caccgagtcg ccgagcgccg gccttgcgca gcgggtggcg gggatgcccg acgcggaacg 13080 
gcagcgggtc ctcctcgaaa cggtgcgggc ggcggccgcg gcggtcctga ggcacgagac 13140 
ggtggatgcg gtcgcgccca cccgggcctt caaggacgcc ggcttcgact cgctcacggc 13200 
gctcgaactg cgcaaccacc tcaacagcac gaccggtctg agtctgcctc cgacggtggt 13260 
cttcgaccac cccaccccgt ccacgttggc gaagttcctg gagggcgtcc tcgtcggcgc 13320 
ttctgccgag gaagtcccgg tgactgccgc agccgtgccc gtcgacgagc ctattgccat 13380 
cgtcggcatg gcctgccgct accccggcgg agccgacact cccgagaagc tctgggacct 13440 
cctgctggcc ggtgctgacg tcatcggccc agcccccgac gaccggggct gggacgtgga 13500 
ctccttcttt gatcccgtgc cgggcgccgc ggggaagtcg tatgcgcggg agggggggtt 13560 
tgtgtatgac gcggggatgt tcgatgcgga gttctttggt gtgtcgccgc gtgaggcggt 13620 
ggcgatggat ccgcagcagc gcttgttgtt ggagacgtcg tgggaggcgt tggagcgtgc 136B0 
gggaatcgat ccggcgggtc tgcggggtag ccggaccggc gtgtactccg gcctgaccca 13740 
ccaggagtat gccgcccgtc tgcacgaggc tccgcaggaa ctcgagggct atctgctcac 13800 
cggcaagtcg gtgagcgtcg cgtcgggtcg tgtttcgtat gtgttggggt tggagggtcc 13860 
gtcgatttcg gttgatacgg cgtgttcgtc gtcgttggtg gcgttgcatt tggcgtgtca 13920 
ggggttgcgg ttgggtgagt gtgatgtggc gttggcgggt ggggtgacgg tgattgcggc 13 980 
gccggggttg tttgtggagt tttctcggca gggtgggttg tcgggtgatg ggcggtgtcg .14040 
ggcgtttgcg ggtggtgcgg atgggacggg gtggggggag ggtgcggggg tggtggtgtt 14100 
ggagcggttg tcggtggcgc gggagogtgg tcatcgggtg ttggcggtgg tgcggggttc 14160 
tgcggtgaat caggatggtg ggtcgaatgg tttgacggcg ccgtcggggg tggcgcagcg 14220 
tcgggtgatt ggtgcggcgt tggtggcggc gggtttgggt gtgtcggatg tggatgtggt 14280 
ggaggcgcat gggacgggga ctcggttggg tgatccgatt gaggctgagg cgttgttggg 14340 
gtcgtatggg cggggtcgtg tgggtggggc gttgttgttg ggttcggtga agtcgaatat 14400 
tggtcatacg caggcggctg cgggtgtggc gggtgtgatc aagatggtga tggcgt.tgcg 144.60 
ggcgggggtg gtgccggcga cgttgcatgt ggatgtgccg tcgccgttgg tggattggtc 14520 
ttcgggtggg gtggagttgg tgacggaggc gcgggattgg ccggtggtgg gtcgtgtgcg 14580 
tcgtgcgggt gtgtcggcgt ttggggtgtc ggggacgaat gcgcatctga ttttggagca 14640 
ggcccccgag ttcgacgatc ctgccgattc cgattccgat tccgattccg attccgatgc 14700 
cggtgtcgtg gatggcggcg agggtggtgt tggcaggagc ttgtcggtgg ttccggtggt 14760 
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ggtgtcgggt 


cgttcggtgg 


gggctttgcg 


ggcgtatgcg ggtcggttgc 


gtgaggtgtg 


14820 


cgcggggttg 


tctgacggtg 


gtggctccgg 


tggtggttct- ggtttggtgg 


atgtgggttg 


14880 


gtcgttggtg 


tcgtcgcggt 


cggtgtttga 


gcatcgggcg gtcgtgttcg 


gtgggggtgt 


14940 


ggaggaggtt 


gttgctggtc 


ttggtgcggt 


ggcttctggg gcggtggctt 


cgggttcggt 


15000 


ggtggtgggt 


tcggtggcgt 


cgggtgttgc 


tggtggtggt ggtcgggtgg 


tgtttgtgtt 


15060 


tccgggtcag 


ggttggcagt 


gggtgggtat 


gggtgcggcg ctgctggacg 


agtcggaggt 


15120 


gttcgccgag 


tcgatggtgg 


agtgtggtcg 


ggcgttgtcg gggtttgtgg 


attgggattt 


15180 


gttggaggtg 


gtgcgcggcg 


gggcgggtga 


gggggtgtgg ggtcgggttg 


atgtggtgca 


15240 


gccggtgtcg 


tgggcggtga 


tggtgtcgtt 


ggcgcggttg tggatgtcgg 


tgggtgtggt 


15300 


gccggatgcg 


gtggtgggtc 


attcgcaggg 


tgaggttgct gcggcggtgg 


tggggggtgt 


15360 


gttgagtgtg 


gctgatgggg 


cgcgggtggt 


ggcgttgcgg tcgcgggtga 


tcggtgaggt 


15420 


gttggccggt 


ggtggtgcga 


tggtgtcggt 


cggactgccg atcgtggatg 


tgcaggaacg 


15480 


gttggcgggg 


tggggtggtc 


ggttgggtgt 


ggcggcggtg aatggtccgt 


cgttgacggt 


15540 


ggtgtcgggg 


gatgtggatg 


ctgctgtggg 


gtttgttggt gagtgtgagc 


gggatggggt 


15600 


gtgggtgcgg 


cgggtggcgg 


tggattatgc 


gtcgcattcg gcgcatgtgg 


aggcggtgga 


15660 


ggggatgctg 


tcggggttgt 


tgggtggttt 


gtgtccgggg cggggtgtgg 


tgccgtttta 


15720 


ttcgtcggtg 


gtgggtggtg 


tggttgatgg 


ggtgggtttg gatggtgggt 


attggtatcg 


15780 


gaatctgcgt 


gagcgggtgt 


tgttttcgga 


tgtggtgggg cggcttgttg 


gggatgggtt 


15840 


ttcggggttt 


gtggagtgtt 


cggggcatcc 


ggtgttggcg ggtggggtgt 


tggagtpggt 


15900 


ggcggtggtg 


gatccggatg 


tgcggccggt 


ggtggtgggg tcgctgcgcc 


gtgatgatgg 


15960 


tgggtggggc 


cggtttctga 


cgtcggtggg 


tgaggcgttc gtcggcggga 


tgagtgttga 


16020 


ctggaagggt 


gtgttcgcgg 


gggcgggcgc 


gcggttggtt gacctgccga 


cgtatccgtt 


16080 


ccaacgacgc 


cactactggg 


cccagacctc 


gcccgctggc gtcgggacgg 


ccgcggcggc 


16140 


ccggttcggc 


atggagtggg 


aggaccatcc 


cctgctcggc ggtgcgctgt 


cggtcggggg 


16200 


ctccaggagc 


ctgcttctgg 


ccgggcatct 


gtcgctcgcc tcgcacgcct 


ggctgaccga 


16260 


ccatgccgtc 


tccggcaccg 


tgctgctgcc 


cggtacggcc ttcgtggaac 


tcgccctgca 


16320 


cgccgccgct 


gcggctggct 


gtccggaggt 


cgaggagctg cggctggagg 


ctcccctggt 


16380 


ggtgccggcc 


aggggcgggg 


tgcggctcca 


ggtgctcgtg gacgaccccg 


acgacggatc 


16440 



- 10 - 



wo 03/010193 



PCT/CA02/01177 



cgaccgccgc gcggtaagcg tgttctcccg ggacgatgcg gcgccggccg agtccgcctg 16500 

gacgcggcac gcggtgggcg tcctggccgc gcggtcgcgg cctgcaccgg ctgcgccctg 16560 

gcacaccgac gcctggccac cttcgggcac ggagccggtc gacgtggccg acctgtatga 16620 

gcggttcgcg gcgctgggct acgagtacgg ggaggcgttc gccgggctcc agggggtctg 16680 

gcggggggac ggcgaggtgt tcgccgaggt gcggctgccc gaccgggtca gcgcggaggc 16740 

cattcgcttc gggctgcatc ccgcgctgct cgacgccgcc ctgcaggggt ggttggcggg 16800 

cgacctcgtc ggcgtccccg agggcagtgt gctgctgccc ttcgcctggc agggcgtcgt 16B60 

gctccacgcc accggcgccg acactctgcg ggttcgcatc ggccggtccg gtgactcggc 16920 

cgtctgcctg cacgcggtgg acccggccgg tgctccggtc ctctcgttgg acgccctggc 16980 

cctgcgtccg ctcgtccggg aacgcctcgg gctgcccgcc gatgccggag ccggggcgtt 17040 

gtaccgggtc ggctggcggc ggcaggccgc cgttgccggg gcagccgacc ggcggtgggc 17100 

ggtcgtggcc ccgaacggtg ccgaggcgga cggggccgcc gagccgcacc ggtggccggt 17160 

cgccgccgtc gacgtgcaca ccgacgtgga ctcgctgcgg gcggccctgg acgcgggcgc 17220 

ggaactgccc gccgtcgtcc tcgccgactt ccggagggcc gccggctgga gcgtcgacag 17280 

ttcgctggcc gccggcccgt cgcccaacga cggcgcggtg ggcgacggcg cggtgggcga 17340 

cgcccgggcc ggggccgtcc gggcggcgac ccgggccggg ctggatctgc tgcaacgctg 17400 

gctggccgac gagcggttca tcgcggccag gctcgtggtg gtcaccgaac gggccgtggc 17460 

cgccgggccg gacgaggacg tgccgggcct cgtccacgcg ggactgtggg gcctgctccg 17520 

gtcggcccaa tcggagcacc cggaccgctt cgtgctggtg gacgtcgacg cggacgacag 17580 

ctcgctcgcg gcgctgccgt cggccctcgc catggacgcg ccccaactgg tggtgcgggc 17640 

cggtcagatc ctgctgcccg agatcgagcc ggtgcggccc gtacccgagc cggagcaggc 17700 

ggaacccgaa ccgggggccg tcctggaccc cgacggcacg gtcctgctca ccggcgcgac 17760 

cggcacgctc ggcgggctgc tcgcccggca cctggtgacc acccgtggtg cgcgccggct 17820 

gctgctggtc agccgcagcg gtccggacgc ccccgatgcc ggccggctga ccgaggagct 17880 

gaccgggctc ggcgcccacg tgacgctggc cgcctgcgac accacggatc gcgccgcgct 17940 

ggccggcgtc ctgggcggca tccccgccga gcatccgctg accgccgtgg tgcacgtggc 18000 

cggcgtactc gacgacgggg cggtgcaggc gctcaccccc gagcgggtcg acgcggtgct 18060 

ccggccgaag gtggacgcgg cactgcacct gcacgaactg accgcggggc tgccgctggc 18120 

cgcgttcgtg ctgttctccg gggcggcggg gatcctgggc cggcccggcc aggccaacta . 18180 
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cgcggcggcg 


aacaccttcc 


tggacgccct 


ggcgcagcac 


cgacgggccc 


ggggcctgcc 


18240 


cggcgtctcc 


ctcgcctggg 


gcctgtgggg 


gctggccagc 


gacatgacgg 


gccacctggg 


18300 


cgagcaggac 


ctgcggcgga 


tgcggcgctc 


cggcatcgcc 


ccgatgaccg 


gcgaggaggg 


18360 


cctcgcgctg 


ttcgacctgg 


ccctcgacct 


ggcccgggac 


gaaccggtgc 


tcgtaccggc 


18420 


ccgactggac 


ccggcggcgc 


tgcgccggga 


gtgggccgcc 


aacggaccgg 


gcgccgtccc 


18480 


ggtcctgctg 


cggggtctgg 


tgccggcggc 


tccgctccgt 


cgcgcggccc 


cgtcgggcgc 


18540 


cgccggcggt 


gcgcccgtgc 


ccgccgtcgc 


cgcgccgcag 


caggcggacg 


agctgcgcgg 


18600 


gcaactggcc 


gggaaggacg 


cgcaggccca 


ggtccggcag 


ctgctggatc 


tggtacgcgc 


18660 


ccatgtcgcc 


ggggtgctcg 


ccctccggga 


agcggcggac 


gtggacccgg 


gcagaccgtt 


18720 


ccgcgaggtc 


ggattcgact 


cgttgaccgc 


agtcgaactg 


cgcaaccggc 


tgggctcggc 


18780 


gaccggcctg 


cggttggcac 


cgagcctggt 


gttcgaccat 


ccgaccccgt 


cggccgtggc 


18840 


cgagcacctc 


gtggaccgcc 


tcgccgccga 


gggggcggct 


gacgagggcg 


cggcggcact 


18900 


gaccgggctc 


gacgcagtgg 


ccgcggcgct 


cggcgggatg 


cggacggacg 


acgttcgccg 


18960 


ggacatcgtc 


cgcaggcggc 


tggaggagat 


gctcgccctg 


gtcggcgggc 


cacggtccgg 


19020 


gccggcaggt 


gacgggctgg 


tggatgccac 


ggtcgccgag 


cgactggact 


cggcttccga 


19080 


cgacgaactc 


ttcgccctga 


tcgaggagca 


gctgtgaacc 


ccgaccgagg 


agagggccgg 


19140 


caggtgaccg 


cgaacgagga 


ccggatgcgt 


gagtacctca 


agcgggtcac 


cgccgagctg 


19200 


gccgggacgc 


ggcgacgcct 


gcgcgagctg 


gaggacagcg 


cgcgtgagcc 


catcgcgatc 


19260 


gtgggcatga 


gctgccggtt 


gccgggcggg 


gtgagcacgc 


ccgaggacct 


gtggcggctg 


19320 


gtcgaggccg 


gtaccgacgc 


gatctccggc 


ttccccgacg 


accggggctg 


ggatgtcggg 


19380 


aggctctacg 


acccggatcc 


ggactcgacc 


ggaacgagct 


acgtgcgcga 


gggcggcttc 


19440 


ctctacgact 


gcgccgagtt 


cgacccggag 


ttcttcaccg 


tctcgccccg 


cgaggcgctg 


19500 


gccatggacc 


cgcagcagcg 


gctgctgctg 


gaggccgcct 


gggagacctt 


cgaacgggcg 


19560 


gggatcgccc 


ccgactcggc 


ccgcggcacc 


cgcaccgggg 


tctacgtcgg 


ggtgatgtac 


19620 


gacgactacg 


gcagccggct 


gtcggaggtg- 


ccgaaggacc 


tggagggcta 


cctggtcaac 


19680 


ggcagcgcgg 


gcagtgtcgc 


gtcgggccgg atcgcgtaca 


cgctggggtt 


gcaggggccg 


19740 


gcggtgacgg 


tcgacacggc 


ctgctcgtcg tcgctggtcg 


cgttgcacct 


ggccgtgcag 


19B00 


gcgctgcggt 


cgggcgagtg 


tgagctggcc 


ctggcgggcg 


gggcgacggt 


gctcgccacg 


19860 
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ccgacgatgt tcgtcgactt cgcccggcaig cgcggtctcg ccgaggacgg ccgttgcaag 19920 

gcgttcgcgg acgccgccga cgggaccggg ttcggcgagg gcgtggggat gctgctggtg 19980 

gaacggctct cggacgcggt ccgcaaccgt cgccaggtgc tggccgtcgt gcggggcagc 20040 

Scggtcaacc aggacggggc gagcaacggc ctgaccgccc cgaacggtac ggcccagcaa 20100 

ctggtcatcc ggcaggcgtt gaccaacgcg gggctggccg cggacgaggt ggacgcggtg 20160 

gaggcacacg gcaccggcac ccggctgggc gatccgatcg aggcgcaggc gctgctggcg 20220 

acgtacggcc. agggccggcc ggcggaccgg ccgctcctgc tgggatccct gaagtccaac 20280 

atcggccaca cccaggccgc cgcaggggtc gccggggtga tcaagaccgt gctggcgctg 20340 

cgtcacgcgc ggctgccccg gaccctgcac gtcgatcgcc cctcgacccg ggtggactgg 20400 

tcgtcgggcg cggtgcggct gctgaccgag gggcggccct ggcccgatca cggcgaccgg 20460 

ccccgccggg ccggggtctc ctcgttcggc gcgagcggca ccaacgcgca cgtcatcctg 20520 

gagagcgccc ccggtgcggc ggcgggggcg accggggcga cggacctctc ggccccgccg 20580 

gcatccgtcg cccaccatcc ggccacggcc acggccacgg ccccggcggc gacggtgccc 20640 

actgcccacg aaccggcggg gacggccggc gacgaccccg tctgggtcct gtccggccgg 20700 

accgaggcgg ccctgcgcga gcaggcccgg cggctacacg cccacctgac atcccgggcg 20760 

cggcccgagc ccgccgacgc cgtggcccgc gcgctggcgc gctcccgcac cgcgttcgcg 20820 

taccgggccg ccgtgctggg ccgggacgac accgcgcggc tcgacggcct ccacgcgctc 20880 

gcggcgggtc gcagcgccgc ggggctcgtc accgggcggg ccgtgccgga gcggcgcgtg 20940 

gccttcctct tcaccgggca gggcagccag cgaccgggcg .cgggccggga actgtacgcc 21000 

cggcatcccg ccttcgcaca ggccctggac ggcgtcctcg cggaactcga ccggcacctg 21060 

gaccggccgc tgcgcgccgt catgctcgcc gagccgggca ccgaggcggc ggcgctgctg 21120 

gacgacaccg cgtacaccca gcccgccctg ttcgcgctgg aggtggcgct gttccggctg 21180 

gtcacgagct gggggctgcg gcctgacgcc ctgctgggcc actcggtcgg ggagatcacc 21240 

gcggcgtacg tcgcgggcgt cctcaccctg ccggacgccg cccggctggt ggcggtgcgc 21300 

ggtcgactca tggcggacct gcgggccggc ggtgcgatgg ccgcgctcca ggccgccgag 21360 

agcgaggtcg accccctgtt ggcggggcgg gagggcgaac tgtcgatcgc agcggtcaac 21420 

gggccgcagg caaccgtgat cgcgggcgac gaggcggccg tcgaggagca ggtcgcgctg 21480 

tggcgtgacc ggggtcgccg ggccaggcga ctgcgggtcg gccacgcctt ccactccgta 21540 

cggatggacg ggatgctcgc cgagttcgag aaggcgatgg gtgatctccg tgccggcgag 21600 
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ccgacgatcc ccgtggtcgc 


caacgtcagg 


ggggcgatcg 


cgtccggcac 


cgacctccgt 


21660 


acggccgggt actggatccg 


gcacgcccgc 


gagccggtgc 


gtttcctcga 


cggcatgcgt 


21720 


gcgctgcggg ccgagggcgt 


cgacacgttc 


gtggaactcg 


gccccgacgg 


agtgctcacg 


21780 


gcgatggcgc gcgactgcct 


ggcggatccc 


gccgacccgg 


tggatctcgc 


ggacgccgcc 


21840 


gagcccgccg gggccgcgga 


gcccgaccgc 


tccctgctgt 


tcctgcccac 


cctgcgccgg 


21900 


gaccgcgacg acgcagtggc 


cgtgcgggag 


gccctggcat 


ccgtccacgt 


gcacgggctt 


21960 


cccgtcgacc cggtcgcgcc 


gctcggcgac 


ggcccgctcg 


ccaccgacct 


gcccacctac 


22020 


ccgttccagc ggtcccgcta 


ctggctcgac 


ccgcgtcccg 


gggcacgcga 


cctgaccgcc 


22080 


gtgggcctcg acgtggccgg 


gcacccgctg 


ctcgccgtcg 


ccgtggacct 


gcccgacggc 


22140 


gccggcacgg tctggagcgg 


tcagctctgc 


gtgcggacgc 


atccgtggct 


cgccgaccac 


22200 


agcgtgtggg ggcgcacggt 


ggtgccgggg 


accgcgctgc 


tggagatcat 


gcaccgagtg 


22260 


cgcgccgagg tgggctgcac 


ccgggtcgcg 


gaactgacct 


tcgaggcgcc 


gatggtgctg 


22320 


gccgacgacg ggggcgtccg 


cgtgcgggtc 


gtcgtcgacg 


gaccagacgc 


cgacggggcc 


22380 


cgccaggtcc ggatccactc 


cgcaccggtg 


gggcccgagc 


ctccccactg 


gacccggcac 


22440 


gcctcgggcc gcgtcgacag 


cgccgcgccg 


gggccggccg 


ccggcccacc 


cgcgtgggac 


22500 


gccggccctg gcagcaactg 


gccgcccgag 


ggggcggagc 


cggtgggcgt 


cgagagcgag 


22560 


tacgagcgct tcgccgacaa 


cggcatcgga 


tacggccccg 


ccttccgagg 


gctgcgcgcc 


22620 


gcgtggcgtc gcgggaacga 


gacgttcgcc 


gaggtccggc 


tccccgaggg 


gtacgccgcc 


22680 


gaggcgggcg actacgccgt 


ccatccggca 


ctgctggacg 


cggccctgca 


cgcgatcgtc 


22740 


ttcggtgacc agtttcccgg 


tggggcacac 


gggatgctgc 


cgttcgcctt 


caccgacgtg 


22800 


cgggtgttca gctccggcgc 


cgaccggctc 


cgggtgcgca 


tcgcgcccgc 


cgatgccgac 


22860 


tcggtctgcg tgaccgtcgc 


cgacggcgac 


gggacgccgg 


tcctcgccgc 


agccaccctg 


22920 


gcgttgcgcc gggtcgccgc 


cgaccggatc 


gcggcgaccg 


tcaccggcca 


ggcaccgctg 


22980 


taccggttgg agtggtccgc 


cgtgcggccc 


gccccggtgg 


ccaccggggc 


gcggttcgcc 


23040 


gtcgtcggcg cggacgcccc 


gctgccgtcc 


ggtgcgctgg 


gggccggggt 


gcccgtccag 


23100 


gcgtacccgg acctgggcgc 


gctggccggc 


gcgttggcca 


ccaacggggc 


accgggccac 


23160 


gtgctcgtcg acttccgccg 


ccgcgccgac 


ggcccggcag 


ggcggcagcc 


cggtgacgtg 


. 23220 


ggtgcacgga cccgacgggc 


gctggccgtc 


gtccaggagt 


ggctcgccga 


cgaccgtttc 


23280 
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accggctcac 


ggctggtcgt 


gctcaccagc 


ggagccgtgg 


acgccggaac agccgtcacc 


23340 


gatccggccg 


ccgccggggt 


gtggggcctg 


ctgcgggtcg 


cccagaccga 


gcatccggac 


23400 


cggttcgtcc 


tcgtggacac 


cgacgaccac 


ccggattcgc 


tgcgtgccct 


ccccggggcg 


23460 


atcgttgcgg 


gcgagccgca 


gctggcactg 


cgggccggca 


cggccagcgt 


tccgggcctg 


23520 


gtgcgggtgc 


cggccggcac 


cggtgccgcc 


ccgccgtggg 


ccgcagccgg caccgtcctc 


23580 


gtcaccgggg 


gcaccggcat 


gctcggcggc 


gcggtggccc 


crcrcaccfcaoti 

t3 2S ^^^^^^^^ z3 


\^ w Z7 


23640 


ggggtccgcc 


gcctgctgct 


ggtcggccgg 


cgcgggccgg 


\> CL w ^ ^ ^ 


r* cr r* era ccrcr r* c 
*-'yy^yy 


23700 


ctgacccggg 


aactggagga 


gctgggagcg 


tccgtccgcg 


i~ caccacclr a 


ccf acat ccLCic 


23760 


gatcgtggcg 


cggtgacgcg 


cctgttggcc 


ggggttcccg 


rT'cjpacat" co 


actcacccrccf 


23820 


gtggtgcact 


cggccggcct 


gcccgacgac 


ggcgtgctga 




^yy-y^-y^yy 


23880 


gtcgcggcgg 


tgctccgcgc 


caaggcggac 


gcagcggtca 


Cl ^ L^Cl^^ Ci 


a r* h f a p r* ccrcf 


23940 


catctcgacc 


tcaccgcctt 


cgtgctgttc 


tcgtcggtag 




^3 *^ Z3 ^ z3 


24000 


gggcaggccg 


ggtacgccgc 


cgcgaacgcc 


ttcctcgacg 






24060 


ggccaggggc 


tgcccgccac 


cgccctggcg 


tgggggccgt 


*-y y "-^y y ^y y 


era 1" ocrc cere c 


24120 


ggcctcggca 


ctgcggacgt 


ggcacggctg 


cgccggtccg 




ant* ncTdPCft'Cf 


24180 


gacgacgcgc 


tcgttctctt 


cgacgccgcc 


tgctcccgac 


p nnf* rtcT/^ nctf* 

cggcggcggc 


g u cLo ct v« w 


24240 


gtccgcctcg 


atccggcggt 


gctgcggtcc 


cacgccgccg 


1.^^ CLl^ Cl^ Vb* ^ 


crcr i" CTP r* p a p r* 


24300 


gtcctgctcg 


gtccgagccg 


tgcgcacccg 


agggacggta 


ccrcccrcfQaaa 


cf c c t crc ccraa 


24360 


gccgccctcg 


ccgcgctgct 


gaccggcagg 


tcggcggccg 


aaccrh accrac 


cratCGtaacc 


24420 


gacctggtgc 


ggacggaggc 


cgccgccgtt 


ctcgggcatg 




craticrctiaaac 


24480 


acgcagcggg 


ccttccgcga 


cgccggcttc gactcgctca 


ccocccrtcraa 


cctccacaac 


24540 


cggctcggcg 


cggccacggg 


cctcagcctg ccggccgccg 


tcgtcttcga 


ccacccgacc 


24600 


ccggcggccc 


tggccgccta 


tctgcggacc gaactggacc 


gccggtcgcc 


caccgggcaa 


24660 


cagttcccga 


cggacgccgc 


cggtgttctg gccatgctcg 


accgcctgcg 


ggacggaatc 


24720 


gcgacggtcg 


tcagggacga 


cgccgaccgg 


acccgcgcag 


ccgacctgtt 


gcgtgtcctg 


24780 


ctcgccgagg 


tcggcgggcc 


cgggacgggc 


ccgccccgcg 


acaccgacgg 


cggctccggc 


24840 


ggcgaggtca 


gcgaccgcct 


ccggaccgcc 


tccgacgagg 


aactgttcga 


cctgctcgac 


24900 


agcgatttcc 


gactggcgta 


gcgccggccg gagcactgcc 


cgctcgaatc 


gaccgacccc 


24960 


gggaagacac 


tcggatcaca 


gggg^aagcg ccgtgtctgt 


caacaacgaa 


gacaagcttc 


25020 
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gcgagtatct gcgtcgtgcc atggcggatc tccatgagtc ccgcgagcgg ttgcggcagt 25080 

acgagtccgc tgctgctgtg gatgatccgg tggtggtggt ggggatgggt tgtcgttttc 25140 

cgggtggggt ggtgtgtgcg gagggtttgt gggatttggt gttggggggt ggggatgcgg 25200 

tgtcggggtt tccggtggat cggggttggg atgtggaggg gttgtttgat ccggtgcggg • 25260 

gtgtggtggg gaagtcgtat gtgcgggagg gggggtttgt gtatgacgcg gggatgttcg 25320 

atgcggagtt ttttggtgtg tcgccgcgtg aggcggtggc gatggatccg cagcagcgtt 25380 

tgtttttgga ggtgtcgtgg gaggcgttgg agcgtgcggg gattgatccg ttgggtttgc 25440 

ggggttcgcg gacgggtgtg tatgtggggg tgatgggtca ggagtatggg ccgcggttgg 25500 

tggagtcggg tggtgggttt gagggttatt tgttgacggg gacgtcgccg agtgtggtgt 25560 

cgggtcgtgt ttcgtatgtg ttggggttgg agggtccgtc gatttcggtt gatacggcgt 25620 

gttcgtcgtc gttggtggcg ttgcatttgg cgtgtcaggg gttgcggttg ggtgagtgtg 25680 

atgtggcgtt ggcgggtggg gtgacggtga ttgcggcgcc ggggttgttt gtggagtttt 25740 

ctcggcaggg tgggttgtcg ggtgatgggc ggtgtcgggc gtttgcgggt ggtgcggatg 25800 

ggacggggtg gggggagggt gcgggggtgg tggtgttgga gcggttgtcg gtggcgcggg 25860 

agcgtggtca tcgggtgttg gcggtggtgc ggggttctgc ggtgaatcag gatggtgggt 25920 

cgaatggttt gacggcgccg tcgggggtgg cgcagcgtcg ggtgattggt gcggcgttgg ' 25980 

tggcggcggg tttgggtgtg tcggatgtgg atgtggtgga ggcgcatggg acggggactc 26040 

ggttgggtga tccgattgag gctgaggcgt tgttggggtc gtatgggcgg ggtcgtgtgg 26100' 

gtggggcgtt gttgttgggt tcggtgaagt cgaatattgg tcatacgcag gcggctgcgg 26160 

gtgtggcggg tgtgatcaag atggtgatgg cgttgcgggc gggggtggtg ccggcgacgt 26220 

tgcatgtgga tgtgccgtcg ccgttggtgg attggtcttc gggtggggtg gagttggtga 26280 

cggaggcgcg ggattggccg gtggtgggtc gtgtgcgtcg tgcgggtgtg tcggcgtttg 26340 

gggtgtcggg gacgaatgcg catctgattt tggagcaggc ccccgagttc gacgatcctg 26400 

ccgattccga ttccgattcc gattccgatg ccggtgtcgt ggatggcggc gagggtggtg 26460 

ttggcaggag cttgtcggtg gttccggtgg tggtgtcggg tcgttcggtg ggggctttgc 26520 

gggcgtatgc gggtcggttg cgtgaggtgt gcgcggggtt gtctgacggt ggtggctccg 26580 

gtggtggttc tggtttggtg gatgtgggtt ggtcgttggt gtcgtcgcgg tcggtgtttg 26640 

agcatcgggc ggtcgtgttc ggtgggggtg tggaggaggt tgttgctggt cttggtgcgg 26700 
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tggcttctgg ggcggtggct tcgggttcgg tggtggtggg ttcggtggcg tcgggtgttg 26760 

ctggtggtgg tggtcgggtg gtgtttgtgt ttccgggtca gggttggcag tgggtgggta 26820 

tgggtgcggc gctgctggac gagtcggagg tgttcgccga gtcgatggtg gagtgtggtc 26880 

gggcgttgtc ggggtttgtg gattgggatt tgttggaggt ggtgcgcggc ggggcgggtg 26940 

agggggtgtg gggtcgggtt gatgtggtgc agccggtgtc gtgggcggtg atggtgtcgt 27000 

tggcgcggtt gtggatgtcg gtgggtgtgg tgccggatgc ggtggtgggt cattcgcagg 27060 

gtgaggttgc tgcggcggtg gtggggggtg tgttgagtgt ggctgatggg gcgcgggtgg 27120 

tggcgttgcg gtcgcgggtg atcggtgagg tgttggccgg tggtggtgcg atggtgtcgg 27180 

• tcggactgcc gatcgtggat gtgcaggaac ggttggcggg gtggggtggt cggttgggtg 27240 

tggcggcggt gaatggtccg tcgttgacgg tggtgtcggg ggatgtggat gctgctgtgg 27300 

ggtttgttgg tgagtgtgag cgggatgggg tgtgggtgcg gcgggtggcg gtggattatg 27360 

cgtcgcattc ggcgcatgtg gaggcggtgg aggggatgct gtcggggttg ttgggtggtt 27420 

tgtgtccggg gcggggtgtg gtgccgtttt attcgtcggt ggtgggtggt gtggttgatg. 27480 

gggtgggttt ggatggtggg tattggtatc ggaatctgcg tgagcgggtg ttgttttcgg 27540 

atgtggtggg gcggcttgtt ggggatgggt tttcggggtt tgtggagtgt tcggggcatc 27600 

cggtgttggc gggtggggtg ttggagtcgg tggcggtggt ggatccggat gtgcggccgg 27660 

tggtggtggg gtcgctgcgc cgtgatgatg gtgggtgggg ccggtttctg acgtcggtgg 27720 

gtgaggcgtt cgtcggcggg atgagtgttg actggaaggg tgtgttcgcg ggggcgggcg 27780 

cgcggttggt tgacctgccg acgtatccgt tccaacgccg ccactactgg gcaccgactc 27840 

ccaccaaccc cgccaccaac cccgccacca accccgccac caaccccgcc acgggcgaca 27900 

ccaccaccgc cgacccggcg ggtgacctgc ggtatcggat cacctggaaa ccgttgccga * 27960 

ccgacgaccc ccgacccctc accaaccgct ggctgctgat ggtgcccgag gcgctggccg 28020 

gtgacggggt ggtggcgggc gtacggcagg cgctggccgc gcgtggcgcc tccgtcgaac 28080 

tgctgaccgt cggcaccgcc gaccgggccg gccttgccgc gctcctgacc tccgccgccc 28140 

ccggcgaccc ggaggcggcc ggcccggcgg gcgtggtctc cctgctggcg ctcgccgagg 28200 

gcgcggacgc gcgccacccg gccgtaccgc tcggcctgac cgcctcgctc gccctgatcc 28260 

aggcattggc ggacgcgggg acgcaggccc gcctctgggc ggtcacccgg ggggccgtcg 28320 

ccgtgtcctc cggcgaggtg ccggacgccg ggcaggccca ggtgtggggg ctcggccggg 28380 

tcgcggccct cgaactgccg gaccgatggg gcgggctggt ggacctgccg gcgctcaccg 28440 
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gggagcgtgc 


cttcgcgcag 


ctcgccgatg 


tcgtgggcgg 


ctcgaacggc 


gaggaccagg . 


28500 


tcgccgtacg 


ggcctccggc 


gtctacggtc 


gacgcctcgt 


gcgttCGcgc 


gccaccgtca 


28560 


cgtccggcga 


ctggccggcc 


cggggcacca. 


tcctcgtcgt 


cggggacacc 


ggcccggtcg 


28620 


ccgcgctcct 


ggccggccgc 


ctcctcggcg 


acggggcggc 


gcacgtggtg 


ctcgccggcc 


28680 


cggccgccgc 


gtccaccgtc 


gggctcaccg 


gcggggccga 


ccgggtggcc 


ctgatcgact 


28740 


gcgacccgag 


cgaccgggac 


gcgctcgccg 


ggctgctcgg 


cgcgtaccgg 


cccacgacga 


28800 


tcgtggtggc 


tccgcccgcc 


gtcgcgctca 


ccgccctcgc 


cgagaccacg 


ccggaggact 


28860 


tcgtcgccgc 


cgtcgccgcg 


aagacgacga 


cggcagtgca 


cctcgacgcc 


cttgcggcgg 


28920 


aggcggaact 


ggagctcgac 


gcgttcgtcg 


tcttctcctc 


ggtctccggc 


acctggggcg 


28980 


gcgcggggca 


cggcggctac 


gcggcgggca 


ccgcccggct 


ggacgcgctg 


gtcgaggaga 


29040 


ggcgggcccg 


tggcctgccc 


gccacggcga 


tcgcgtggac 


gccgtgggcc 


gacgcgacca 


29100 


cagccgccgg 


cgggcaggca 


cccgatgcca 


gcgccggcgg 


gcacgaaccc 


gacacgaggg 


29160 


ccgggggccc 


cgaccgcgaa 


ctgctgcgcc 


ggggtggcct 


caccccgttg 


gacccggggg • 


29220 


ccgcgctgga 


cgtgctgcgc 


ggggcggtgg 


cgcggggcga 


gggcctggtg 


accgtggccg 


29280 


acgtcgactg 


ggcgcggttc 


gtcgcctcgt 


acaccgcggc 


ccggcccacc 


acgctcttcg 


29340 


acgaactgcc 


cgagctgcgg 


gcgacccggg 


aggcggagca 


caccccggcc 


gaggactcgt 


29400 


cggccggcgg 


cgaactggtc 


cgtgccctca 


gcggccggcc 


cgcggccgat 


cagcaccgga 


29460 


cgctgctgcg 


gctggtccgt 


gcgcacgtcg 


cggccgtcct 


ggggcacgac 


gaggccgagg 


29520 


cggccgatcc 


ggaccgggcg 


ttccgggaac 


tcggcttcac 


ctcggtgacg 


gcggtggacc 


29580 


tgcggaaccg 


gctgaacgcg 


gccaccgggc 


tgaacctgcc 


ggcgtccgtc 


gtcttcgacc 


29640 


atcccagcgc 


ccgggtgctg 


gccgcgtacc 


tgcgtgccga 


gctgctcggg 


ccggaggccg 


29700 


acgaggacac 


ggcggaggcc 


gtcgccccgc 


cgtccgcgcc 


ggccggggcg 


ggcgacgacg 


29760 


agccgatcgc 


ggtgatcggg 


atggcctgtc 


ggttcccggg 


cggggtcgac 


gcccccgacg 


29820 


acctgtggga 


tctgctggcg 


aagggccgcg 


acgccatctc 


caggttcccc 


acgaaccggg 


29880 


gctgggacgt 


cgacggcctg 


tacgacccgg 


acccggaggc 


gcccggccgc 


acctacgtcc 


29940 


gcgagggcgg 


cttcctgcac 


gacgcgcccg 


acttcgatgc 


cgcgttcttc 


gggatctcgc 


30000 


cccgcgaggc 


cctcgccatg 


gatccgcagc 


agcgcctgct 


gctggagacc 


acgtgggagt 


30O60 


ccctggaacg 


ggccgggttg 


gacccgaccg 


cgttgcgcgg 


cacccggacc 


ggggtgttcg 


30120 
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tggggaccaa cggccagcac tacatgccgc tgctgcgaga cggcgcggac gacttcgacg 30180 

gctacctcgg caccggcaac tcggccagcg tcatgtccgg ccggctctcc tacgtcttcg 30240 

gcctggaggg cccggcggtg accgtggaca cggcctgctc cgcctccctc gtggcgctgc 30300 

acctcgcggt gcaggcgctg cgccggggcg agtgcacgct ggccctggtc ggcggggcca 30360 

cggtgatgtc gacgccggac atgctggtgg agttctcccg gcagcgggcg atgtcgccgg 30420 

acggccggtc gaaggcgttc gccgccgccg ccgacggggt ggcgctcagc gagggcgccg 30480 

ccatgatggt ggtgcagcgg ctcgccgacg cggaggccgc cgggcacgag atcctggccg 30540 

tggtcaaggg ctcggccgtc aaccaggacg gggccagcaa cggcctcacc gccccgaacg 30600 

ggccctccca ggaacgggtc atccggcagg cgctggccga cgccggcctg cggccggacc 30660 

aggtggacgc ggtcgaggcg cacggcaccg gcaccgccct gggcgacccc atcgaggcgc 30720 

aggcgctgct cgccacgtac ggccgggacc ggccggcggg ccggccactg tggctcggct 30780 

cgctgaagtc caacatcggt cacacccagg ccgccgccgg catcgccggg gtgatgaagg 30840 

tgatcctggc gctgcggcac gacacgctgc cgcgcacgct gcacgtggac cggccgacgc 30900 

cccgggtgga ctgggcttcc ggggcggtgt cgttgctgac cgagccggtg ccgtggccgc 30960 

agggcgacga accccgccgg gcggcggtgt cctcgttcgg gatcagcggc accaacgccc 31020 

acgtgatcgt cgagcaggcg ccgccggtgg tgcgggaacc gatcgaccac gaggcggacg 31080 

aggtcaccgt cccgctgttc ctgtcggccc gggggagcgc cgcgctctgc gcccaggcgg 31140 

cacggctgcg ggcccggttg atcgaggaac ccgacctgga catcgccgag gtcggctaca 31200 

cgctggcggc cacccgggcc cgcttcgagc accgggccgt ggtgatcggg gagagccgcg 31260 

cggaggtcgg cgacgcgctc gccgcgctgg cccggggcga ggagcacccg tcgctgctgc 31320 

gggggcgggc cggcgcgagc gaccgggtcg cgttcgtctt tcccggccag ggctcgcagt 31380 

gggccgagat ggccgacggc ctgctcgacc gctccccggc cttccgggcg agcgcgtcgg 31440 

cgtgcgacga ggcgctgcgg gcgcacctcg actggtccgt gctggacgtg ctgcgtcgcg 31500 

tgccggacgc gcctgcgctg agccgggtcg acgtggtcca gccggtgctg ttcacgatga 31560 

tggtgtcgct ggcggcggcc tggcgggcgc tgggcgtgca cccgtccgcc gtggtcggcc 31620 

actcgcaggg tgagatcgcg gcggcccacg tggcgggcgg cctctcgctg gacgacgcgg 31680 

cgcgcatcgt cgccctgcgc agccaggcgt ggctgcggct ggccgggcag ggcgggatgg 31740 

tggcggtgtc gctccccgtc gacgcgctcc gcgcccgcct ggcgcggttc ggcgaccggc 31800 

tgtccgtcgc cgcggtcaac agccccggta cggcggcggt gagcggctac cccgacgcgc 31860 
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tcgccgaact cgtcgacgag ctgaccgccg agggcgtgca cgccaaggcg atccqggggg 31920 

tggacacggc cgggcactcc gcgcaggtgg aggtgctgaa ggaccacctg atggccgccc 31980 

tcgccccggt gtcgccccgc agctcgcaga tccccttcta ctcgaccgtc acgggcggcc 32040 

tgctggacac cgcgctgctg gacgccgcct actggtaccg caacatgcgc gacccggtgg 32100 

agttcgagca ggcgacccgg gcgatgctcg cggacgggca cgaggggttc ctggagccca 32160 

gcccgcaccc gatgctgtcg gtgtcgttgc agggcaccgc ggccgatgcc ggggtcgccg 32220 

cgacggtgct ggggacactg cggcgcggca agggcggcgc ccgctggttc ggcatggcgc 32280 

tcgggctcgc ccacgcccac gggatcgaga tcgacgcgag tgtgctcttc ggaaccgact 32340 

cgcgccgggt cgacctgccg acgtacccgt tccagcgcga gcgcttctgg tatcacccgc 32400 

cggccgcgcg cggggacgtg gcctccgccg ggctcagcgg tgccgaccat ccgctgctgg . 32460 

gcggggcggt cgagctgcct gaccggggcg gccacgtgta tccggcccgg ctcggcgtcc 32520 

gacaccaccc gtggctcggc gagcatgccc tgctgggcgc ggcgatcctg cccggggccg 32580 

cgtacgcgga actcgccctg tgggccgggc ggcgtgacgg ggccggccgg atcgaggagc 32640 

tgaccctcga cgcgccgctg gtggtggccg acgagtcggc ggcgcaactg cggctcgtgg 32700 

tgggcccggc ggacgcggag gggcgccggc agctcaccgt ccactcgcgc gccgacggcg 32760 

cggacgcgga caccgcgtgg acccggcacg cgcagggcac cctcgtgccg gccgacgccg 32820 

acgccgccgg gagcggggac ccgggcgcgc cctggccgcc ggccggggcc gagcccgtcg 32880 

aggtggcggg cctgtacgac cggttcgccg accggggcta ccagtacggg ccgtcgttcc 32940 

ggggg9tccg ggccgcctgg cgggccggcg acacggtgta cgccgaggtg gccctgcccg 33000 

tcccgcagcc cgggagcccg cgcttcggtg tccacccggc gctgctcgac gcggcgttcc 33060 

aggcgatgag cctcggcgcg ttcttccccg aggacgggca ggtccggatg ccgttcgccc 33120 
tgcggggcgt gtcgtcgtcc ggggtcgggg ccgaccggct gcgggtcacc atcagcccgg • 33180 

ccggtgccga ggcggtccgg atcgcctgcg tcgacgagcg gggcaacccg gtcgtggtga 33240 

tcgactccct ggtggcgcgc gcggtgccgg tggaggcgct cacccccggc acccccggca 33300 

ccggggacgg cgcgctgcac cacgtcgcct ggaccgcccg gccggaaccg ggggtcgccg 33360 

ccgtgcagcg ctgggcggtc gtgggcgcgg ccgatcccgg gctggccggg ggcctggacc 33420 

gggcgggcgg cctctgcggg gcgtaccccg atctcgccgg tctggtcgcg gcggtggccg 33480 

aaggggcggc gctgcccgac gtggtcgcgg tgccggtccc gtcgggcgcg ccggtcgggc 33540 
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ccgacgcggt gcgcgccacc gtgctcggcg ccctggacct gatccgggcc tggctcgcgg 33600 

tcgagggccg gctggggctg gccaggctgg cgttcgtcac cacctcggcg gtggcggtcg 33660 

gcgacggcac cgagcacgtg gacccggtgt cggccgccct gtgggggctg gtgcgttccg . 33720 

cccagtccga ggagcccggc cggttcgtcc tcgtcgacct ggacgccgac ccggccagcg 33780 

cctcggccct gcccgccgcg ctcgccgccg gtgagccgca actggccgtfc cgcgccgggg 33840 

cggtgcacgt gccccggctg gttcggcacc gaccccgccc ggacggcccg ctgacgcccc 33900 

cggccggtgc cgcgtggcgg ctcgccgccg gtgggcaggg caccctggag ggcctggcgc 33960 

tggtcccggc cccggacgcc ttggcgccgc tggcccccgg gcaggtccgg gtcgcggtgc 34020 

gcgccgccgg agtgaacttc cgggacaccc tcatcgcgct cggcatgtac ccgggcacgc 34080 

cggtgctggg tgccgagggg gccggggtga tcaccgaggt cgcgccggac gtggccggct 34140 

tcgcccccgg cgaccgggtg ctgggcatgt ggaccggcgg cctggggccg gtggcggtcg 34200 

ccgacgcccg gatgctcgcc cgggttccgc gcggctggtc gtacgccgag gccgcgtcgg 34260 

tgccggccgt cttcctcacg gcccactacg cgctcaccag gctcgccggg atccgcccgg 34320 

ggcagtcgct gctggtgcac gcgggggccg gcggcgtcgg catggcgacc ctccaactgg 34380 

cccggcacct gggcgtggag gtctacgcca cggcgagccg gggcaagtgg gacaccctgc . 34440 

gtggcctcgg cctggacgac gcgcacatcg ccgactcccg cagcctcgac ttcgccggac 34500 

ggttcctggc cgccaccggg gggcgcggcg tcgacgtggt gctgaactcc cttgccgggg 34560 

acttcgtgga cgcgtccctg cggctgctgc cgcgcggcgg ccacttcctg gaactgggca 34620 

aggccgacgt ccgcgacccc gaccggatcg cggccgacca cccgggggtc ggctaccggg 34680 

cgttcgacct cgtcgaggct ggtccggagc tggtcgggca gctgctcggc gagctgatgg 34740 

agctgttcgc cgccggggtg ctcagcccgc tgccgttgac cgtgcgggac gtccggcggg 34800 

cccgggaggc gttccgcctg atcagccagg cccggcacgt cggcaaggtg gtgctgacca 34860 

tgccgcccgc gttcggcgcg tacggcaccg tcctggtcac cggcggcacc gggacgctcg 34920 

gcggcgccgt cgcccggcac ctggtcgccc ggcacggcgt acggcacctg gtgctcaccg 34980 

gccgcagcgg cccggcggcg gacggggcgt ccgcgctcgt cgacgagctg accgcgtccg 35040 

gcgcgtcggt gaccgtcgtc gcctgcgacg ccgccgaccg ggtcgcgctg cgccggctgc 35100 

tcgacggcat tccggccgcg cacccgctca ccgccgtcgt gcacgctgcc ggcgtcctcg • 35160 

acgacgccac catcaccgcg ctgaccgccg ggcaggtgga cgcggtgctg cggcccaagg 35220 

ccgacgcggt gatcaacctg cacgagttga cccgggaccg ggagctgtcc gcgttcgtgc 35280 
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tgttctcctc 


ggcggcggcc 


ctgttcggca 


gcccggggca 


gggcaactac 


tcggcggcca 


35340 


acgggttcgt 


cgacgcgttc 


gcccagtacc 


gccgcgcgca 


ggggctccac 


gcggtgtcgc 


35400 


tggcctgggg 


cctgtgggcc 


gacagcagcc 


ggatggccgg 


gcacctcgac 


caggagggga 


35460 


tgcggcgccg 


gatggcgcgc 


ggcggcgtcc 


tgccgctcac 


caccgaccag 


ggcctcgccc 


35520 


tgttcgacgc 


cgcgcagctg 


gtggacgagg 


cgctccaggt 


gccgatccgg 


ctcaacgtcg 


35580 


gcgcgttgcg 


ggccgccggg 


agggtccccg 


cgctcctcgc 


cgacctggtg 


ccggcggcgg 


35640 


cgtcgggggc 


cccggccgcc 


accccgaccc 


gggacgacgc 


ggaccgcacg 


ctcgccgacc 


35700 


ggctcgccgg 


gctgaccgtg 


gccgaacagc 


gggagctggt 


gctggagagc 


gtgcgcggac 


35760 


acgcggcggc 


cgtcctcgga 


cacgccgacc 


cgcaggccgt 


cgacgccgac 


cgggccttcc 


35820 


gggaactcgg 


cttcgactcg 


ctgacggcgg 


tggagctgcg 


caatcggctg 


gccaccgcgt 


35880 


ccgggctgcg 


cctgccggcg 


acgctggtct 


tcgaccaccc 


caccccggaa 


gcgttggcgg 


35940 


agcacctgct 


cgccgggctc 


gcgcccgagc 


aggcccgggc 


cgagttgccg. 


ttgctggccg 


36000 


agctgggccg 


gctggaggcg 


gccctggccg 


ccaccgacgg 


ggccgccctc 


gacgggctgg 


36060 


acgacctggt 


gcgccgggag 


gtcggcgtcc 


ggatcgcggc 


gctggccgcc 


aggtggggcg 


36120 


cggccggcga 


cgacgtggcc 


ggcagcgacg 


gcggcgggac 


ggccgacgcg 


ctcgagtccg 


36180 


ctgacgacga 


cgagatcttc 


gcgttcatcg 


acgagcggtt 


ccgcgcctga 


cgaccccgcg 


36240 


tacgcgaggg 


acggggtgga 


cgggaccgac 


ggtcaggagg 


gacgaggcgg 


catgtcgaac 


36300 


gagcagaagc 


tccgcgagta 


cctgcggttg 


accaccaccg 


agctggccag 


ggccaccgac 


36360 


cggctgcgcg 


cggtcgaggc 


gcgggcgcac 


gagccgatcg 


cgatcgtcgg 


catggcctgc . 


36420 


cggtaccccg 


gcggggtcgg 


ctcaccggag 


gaactgtggg 


agctggtcgc 


ctcgggcacg 


36480 


gacgcgatct 


ccccgttccc 


cgacgaccac 


ggctgggacg 


gcgacgcgct 


gtacgacccg 


36540 


gacccggagg 


cggcgggccg 


cacctactgc 


cgcgagggcg 


ggttcctcgc 


cggggtcggc 


36600 


gacttcgacg 


ccgcgttctt 


cggcatctcg 


ccccgcgagg 


cgctggccat 


ggacccgcag 


36660 


cagcgcctgc 


tgctggagac 


gtcctgggag 


gcgctggagc 


gggccgggat 


ccccccggac 


36720 


tcgctgcgcg 


gcagccgtac 


cggggtgtgc 


gtcggggcgt 


ggcacggcgg 


ctacaccgac 


36780 


gtcgtcgggc 


agcccccggc 


ggaactggag 


ggccacctgc 


tgaccggcgg 


ggtggtcagc 


36840 


ttcacctcgg 


ggcggatctc 


gtacgcgctg 


ggcctggagg 


ggcccgcgtt 


gacggtggac 


36900 


accgcctgct 


cgtcctcgct 


ggtggccctg 


cacctggcgg 


tgcgggccct 


gcggcagggc 


36960 
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gagtgcgacc tggcgttggc cggcggggcg acggtgctgg ccagcccggc ggtgttcgtg 37020 
cagttctcgc ggcagcgggg gctggccccg gacggccggt gcaaggcgtt cgccgactcg 37080 
gcggacgggt tcgggccggc cgagggggtc ggcatgctgg tcgtggagcg gctgtcggac 37140 
gccgtccgcc acgggcgccg ggtgctggoc ctggtcaccg gcacggcggt caaccaggac 37200 
ggggcgagca acggcctcac cgcccccagc ggcocggcgc aggagaaggt gctgcgccag 37260 
gcgctcgtgg acgcccgggt gacggccgcc gacgtcgacg cggtcgaggc gcacggcacc 37320 
ggcacccggc tcggcgaccc gatcgaggtg cgggccctga tgaacgtgta cggtgccggc 37380 
cggcccgccg accgtccgct ctggctcggt tcgctgaagt ccaacatcgg ccacacccag 37440 
gcggcggccg gggtcggcgg ggtcatcaag acggtgctgg cgatgcggca cggcgtcctg 37500 
ccgcccaccc tgcacgtgga cgccccgacc accgaggtcg actggtccgc cggccaggtg 37560 
gccctgctgc gggcagagac accgtggccg gacacgggtc gcccgcgccg cgccggggtc 37620 
tcctccttcg gggtgagcgg caccaacgcg cacgtggtgc tggagcaggc ccctgggccc 37680 
gccgccgccc cggcgggtga cgccccgccc gccgagaccc ggcccgtcgg cgacccgccg 37740 
ccggtcgtac cgctggtgtt gtccgccagg tcgcagccgg cgctggccgg gcaggcccgc 37800 
cggctgcgcg acctgctggc cgcagcgccg gagaccgacc tcgccagcgc cggactcgcc 37860 
ctggccaccg cgcggtcggt gttcgaccac cgggcggtgg tgacggccgc cgggcgaccg 37920 
caggcgctcg acgcgctcga cctgctggcc ggcggcgaac ccggaccggc ggtcacgacc 37980 
ggcgtcgccg cccccaccgg gcgcaccgtg ttcgtctttc ccgggcaggg gacgcactgg 38040 
gccggcatgg gtgccgacct gctcgaccag tcaccggtgt tcgccgagtc gatgcgacgg 38100 
tgcgagcagg cgctgtcggc gcacaccgac tggaagctcg gcgaggtgat ccggggcgcg 38160 
gccggcagcc cgccgctgga ccgcgtggac gtgctccagc ccgtctcctg ggcggtgatg 38220 
gtgtcgctgg cgcaggtgtg gcggtcgctc ggcgtcgagc cggacgcggt ggtcggccat 38280 
tcccagggcg agatcgccgc cgcggtggtc tgcggcgcgc tgaccctgcc ggacgcggcc 38340 
ogggtggtcg cgctgcggtc ccaggtcatc ggtcgggtgc tctccggtcg cggcggcatg 38400 
gcgtccgtcc agctgccggc ccgggaggto gcggggcggc tggccgcctg ggcgggccgg 38460 
ctcgacgtcg cggccgtcaa cgggccacag tcgaccgtcg tgtccggtgc cgccgacgcg 38520 
gtcaccgaac tggtcgaggc gttcgcggcc gaggacgtcc gggtgcggcg gatcccggtg 38580 
gactacgcgt cccactcgac gcaggtggac cggctgcgcg ccgagctgct caccgtcctg 38640 
ggcccggtcg acgcccgtcc ggcgcaggtg cccttctact cgacggtgca gggcgggcgc 38700 
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gtcgacactg 


ccggcctgga 


cgccggctac 


tggtaccgca 


acctgcgggg 


gcaggtccgc 


38760 


ttcgaggaga 


ccgtgcgggt 


gctgctcgac 


gacgggcacc 


gcgccttcgt 


cgaggccgcc 


38820 


gcgcacgccg 


tcctcgtacc 


cgcgatccag 


gagctggggg 


acagcgccgg 


cgtccgggtg 


38880 


gtggccgtgg 


ggtcgctgcg 


ccgggaggcg 


ggcggcctgg 


accggctcct 


ggcctcggcg 


38940 


gccgaggcgt 


tcacccaggg 


ggtggccgtg 


gactggtccc 


gggctctggc 


cggggccgcg 


39000 


cgcgtcgccg 


tggacctgcc 


cacgtacgcg 


ttccagcggc 


aacgctactg 


gctggagccc 


39060 


gccgcgcagg 


cggactccgg 


cccggccggg 


gacggctggc 


gctaccgggt 


cggctggcgg 


39120 


cggcttcagc 


gcaccggcgc 


cgcgccggcc 


gaccggtggc 


tgctggtgac 


cggcccggag 


39180 


cagccggcgg 


agctggtcga 


ggcggtgcgc 


gacgcgctca 


ccgcgcgggg 


cgccgaggtg 


39240 


cgcctggtga 


ccgtcgagcc 


gaccagcacc 


gaccgggccg 


cgtgcgcggc 


gttgctcacc 


39300 


gcggccggtg 


cgggcggggc 


gacccgggtg 


ctgtcgctgc 


tcggcaccga 


tcgtcgcccg 


39360 


caccccgacc 


acccggccgt 


gtccgtcggc 


gccgccgcga 


cgttgctgct 


gacccaggcc 


39420 


gtcgccgacg 


ccctgccggc 


cgcccggctg 


tgggtcgtca 


cccggggcgc 


ggtctccgtc 


39480 


gggcccggcg 


agaccgccga 


cgagcgccag 


gcgcaggtct 


gggggttcgg 


ccgggtcgcg 


39540 


gccctcgaac 


tgccccgcac 


gtggggcggg 


ctcgtcgacc 


tgcccgccga 


cgcggacggc 


39600 


ccggtgtggg 


aggcgttcgt 


ggacgtgctg 


gccggggacg 


aggaccaggt 


cgcgctgcgc 


39660 


ggcccggtcg 


ggtacggtcg 


ccggctccgg 


cgcgcccccg 


cgctacccgc 


gaagcggcgg 


39720 


taccggccca 


ggggcaccgt 


cctggtcacc 


ggcggcaccg 


gcgcgctcgg 


cgcgcacgtg 


39780 


gcccggcggt 


tggccgccgg 


cggggccgcg 


cacctcgtgc 


tcaccagccg 


gcgcggggcc 


39840 


gacgcccccg 


gtgcggccgg 


gctggtcggg 


gaactccggg 


cgctgggcgc 


cgaggtgacc 


39900 


gtcgcggtct 


gcgacgtcgc 


cgaccgggcc 


gccgtggcgg 


cgctgctcgc 


cgggctgccc 


39960 


gccgacgcgc 


cgctgagcgc 


ggtcttccac 


accgcgggcg 


tggcgcactc 


gatgccgatc 


40020 


ggcgagaccg 


ggctcaccga 


cgtcgccgag 


gtgttcgccg 


ggaaggtcgc 


cggagcccgc 


40080 


cacctcgacg 


aactcacccg 


ggggcacgac 


ctggacgcgt 


tcgtcctgta 


ctcgtcgaac 


40140 


cfccrcraccrtcrti 
y ^yyy *-y 


y y y y '^■^^y ^-^'■y 




crccrtaccracrcr 
y ^y '-'»'*^y y yy 


ccraccaacGTC 


CTQcc c ti ccrac 


40200 


gcgctcgccg 


aacggcggcg 


cgccgccggg 


ctgaccgcca 


cctccgtcgc 


ctggggcctg 


40260 


tggggctccg 


ggggcatggg 


cgagggcgac 


gccgaggagt 


acctgagccg 


ccggggcctg 


40320 


cggccgatgc 


ctcccgagcg 


tggcgtggac 


gccctcctgg 


ccgccctgga 


ccgggacgag 


40380 



24 - 



wo 03/010193 PCT/CA02/01177 

accttcgtcg ccgtcgccga cgtggactgg acgctgttca cggccgggtt caccgcgttc 40440 

cggcccagcc cgctgctcgg cgacctcccg gaggcccgcg cgacgctggc cgacgccgga 40500 

cccgcgggc.t ccgacctgcc ggcctggcac gccgccgcga gccccgacga acgccgccgg 40560 

ggcctgctcg acctggtacg ccggcaggtc gccgccgtcc tcggccaccc ggggcccgag 40620 

cacgtcggcc ccgacgccgc gttccgggag atcggattcg actcgctgac cgccgtcgac 40680 

ctggccaagc ggctcagggc ggcggtcggc gtgccgctgt ccgccaccct cgtcttcgac 40740 

caccccaccg cgacggcggt cgccgagcac ctggccgggc tgctcggtcc cgcgccggcc 40800 

ggcggcgacc cgcgcgaggc cgaggtgcgc cgggccctgg ccgacctgcc gctggcccgg 40860 

ctgcggga[cg ccggcctact ggacggcctg cttgcgcttg cggggctgga cgccgacgcg 40920 

gtgccggacg ggcccgagcc ggctcccggc gacgccatcg acgaactcga tccagaggag 40980 

ctggtgcgcc gggtgctgga caacgccagc tcctgacccg ttccctcttc cccccgagga 41040 

gcccgcccat ggtcatgccc cccgacaagg tgatcgaggc gctgcgtgtc tccgtcaagg .41100 

agacggagcg gctgcgccgg cagaaccacg agctgctcgc cgccctgcac gggccgatcg 41160 

ccgtcgtggg catggcctgc cgctacccgg gcggggtgtc ctctccggag gacctgtggc 41220 

ggctggtcga gacgggcacg gacgcgatcg gcggcttccc caccgaccgt ggctgggacg 41280 

tcgacgccgt gtacgacccg gatcctgagt cgcggaacac cacctactgc cgggagggcg 41340 

ggttcctggc cggggcagga gacttcgacg ccgcgttctt cggggtgtcg ccgcacgagg 41400 

ccgtggtcat ggacccccag cagcggctgc ttctggaggt gtcctgggag gcgctggagc 41460 

ggtccgggac cgacccgcac agcctgcgcg gctcgcgcac cggggtctac gtcggtgcgg 41520 

cccaccaggg gtacgcggtc gacgccggtc aggtgccgga gggcgcggag gggttccggc 41580 

tgaccggcag cgccgacgcc gtcctgtccg gacggatctc gtacctgctc gggctggagg 41640 

gtccggccct gaccgtcgag acggcctgct cgtcctcgct ggtggcggtg cacctcgcgg 41700 

tgcaggcgct gcgccggggc gagtgcgggc tggcactggc cggcggggtc gccgtgatgc 41760 

ccgacccggc ggcattcgtg gagttctccc ggcagcgggg cctcgcggcg gacgggcgct 41820 * 

gccgggcgtt cggggcgggc gcggacggca ccggctgggc ggagggcgtc ggtgtgctgg 41880 

tcctgcaacg gctctccgac gcggtgcgcg acggccgctg ggtgctgggc gtgatccggg 41940 

gttcggccgt caaccaggac ggggccagca acgggctgac cgccccgagc ggccccgccc 42000 

agcagcgggt catccggcag gcgctgaccg acgcccggct cggcgccgac cagatcgacg 42060 

cggtcgaggc gcacggcacg ggcacccggc tcggcgaccc gatcgaggcg caggcgctga 42120 
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tcgccgccta 


cggcgccgac 


cggaccccgg 


accggccgct 


ctggctcggc 


tcgttgaagt 


42180 


cgaacatcgg 


gcacgcccag 


gcggcggccg 


gcgtcggcgg 


cctgatcaag 


atgctcctgg 


42240 


cgatgcgggc 


cgggacgctc 


ccacccaccc 


tgcacgccga 


cgtcccgacc 


ccgctggtcg 


42300 


actggtccgc 


cggtgtcgtc 


cggctgtcga 


ccggggtggt 


gccctggccc 


gcgttgcccg 


42360 


gggcgccccg 


cagggccggg 


atctccgcgt 


tcggggtgag 


cggcaccaac 


gcgcacgtga 


42420 


tcgtcgagca 


gccgccgccg 


gtcccggtcg 


acgacccggc 


gccacccacg 


aggaccctgc 


42480 


cgctggtgcc 


gtgggtgctc 


tccggccgga 


cggaggcggc 


gctgcgcgcc 


caggcggacc 


42540 


ggttgcgtac 


gcacctggcg 


gcgcaccccg 


acgcggaccc 


gctggacgtg 


ggattctccc 


42600 


tggccaccag 


ccgggccgcg 


ctggagcacc 


gggccgtgct 


ggtggccgcc 


gaccgcgacg 


42660 


gcctgctccg 


cctcgtcgac 


gcgctggccg 


ccggcgagcc 


ggcggcgggc 


ctgatccggg 


42720 


gcacggtacg 


tcacgatcgc 


cggaccgggt 


tcctcttcgc 


cgggcagggc 


ggccagcgcg 


42780 


tcgggatggc 


gcgcgaactg 


tacgaggcgt 


tccccgcctt 


cgccgacgcc 


ctggaccagc 


42840 


tcgccgcccg 


gctggaccgg 


cacctcgatc 


gtccgctgct 


gcgggtgctg 


ttcgccgagc 


42900 


cggggtcgga 


cgacgcccgg 


ctgctcgacg 


gcacccggta 


cgcgcaggcc 


gccctcttcg 


42960 


ccgtcgaggt 


ggcgttgttc 


cgactggtcc 


acggctgggg 


ggtccggccc 


gacgtgctgc 


43020 


tcggccactc 


ggtgggcgag 


ctggcggccg 


cgcacgtggc 


cggcgtactc 


gacgtggacg 


43080 


acgcgtgcga 


gctggtcgcg 


gcgpggggcc 


ggctgatggg 


ggagctgccg 


tcgggcggcg 


43140 


cgatggtggc 


ggtccgggcc 


accgaggagg 


aggtcgggcc 


cctgctcgac 


gggcagcggg 


43200 


tcgcggtggc 


ggcggtcaac 


ggcccgcgct 


cggtcgtggt 


ctccggcgac 


gaggaggcgg 


43260 


tgctggccgt 


ggccgcccgg 


tgcgccgccc 


tcggccaccg 


gacgcgacgc 


ctcaacgtca 


43320 


gccacgcgtt 


ccactccccg 


cacgtggagg 


cgatgctgga 


gccgttccgg 


cgggtggcgc 


43380 


ggggcctgac 


gtaccatgcc 


ccgacgatcc 


cggtggtgtc 


gaacgcgacg 


ggccggctcg 


43440 


ccaccgccga 


cgcgctgcgc 


gaccccggtt 


actgggtccg 


gcacgtccgc 


cagcccgtcc 


43500 


ggttccggga 


cggggtgcgg 


gccgcccgcg 


accagggggc 


caccgccttc 


gtcgggctcg 


43560 


QCCCQcracQcr 


qqtorctqtqc 


qcqttqqccq 


aqqaqtqcct 


•cgggcccacc 


ggcgacgtgc 


43620 


tgctgctgcc 


ggtgctgcgc 


cccggtcggc 


cggagcccgc 


caccctgctg 


gccgccctgg 


43680 


ccggggcgta 


cgccggcggc 


gcggaaatgg 


actggtcccg 


ggtgttcgcg 


ggcaccggcg 


43740 


cgcgcagggt 


cgagctgccc 


acgtacgcct 


tccagcaccg 


gcgctactgg 


ctggcgccgg 


43800 
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gcccgccgtc ggcccgccgc gacgacgcct ggcggtaccg gatcgcctgg cggcccctgc 43860 

cgaccgtgcc cgccgccgcc gggaccgaga cggtggccgg ggcgtggttg ctggtggtcc 43920 

ccgcccacga cggcgtcgcg tcgctcgccg acgccgccga gcgggccgtg caccggggcg 43980 

gggccacggt cacccggctg acggtggacg ccgccgacgt ggaccgggac accctcgccg 44040 

ccgtgctgac cgaggccgcc gccgacgcgg acggcgggcc ggacggggtg ctctgcctgc 44100 

tgggcctcga cgaccgggca catccccggt ccgcctcggt gccccgcggg gtgctggcga 44160 

ccctgtccct cgcccaggcc ctgaccgacc tgggggcctc cgcgcggctg tggtgcgtga 44220 

cccggggggc ggtcgccgtg acgcccggcg agtccccgtc ggtcgccgga gcccagttgt 44280 

ggggcttcgg ccgcgtggcc gcgctcgaac tcccccggtc ctggggcggc ctggtggacc 44340 

tgccggtcga cccggacgac cgggactggg acctgctgcg gcgcgcgctg cgcggcccgg 44400 

aggaccaggt cgcggtccgg ggggcggtcg ggtacgcccg gcggctggtc cccgcgcccg 44460. 

cgccccgggc cgagcgggcc tggcgtccgc gcggcacggt cctggtgacc ggcggtacgg 44520 

gcgcgctcgg cgcgcacacg gcccgctggc tggcgcgcaa cggcgccacg cacctcgtcc 44580 

tcaccagccg ccggggcggg aacgcccccg gggtcgccgc gctgcgggcg gaactggtca 44640 

cgctcggtgc cgaggtgacc gtggtcgcct gcgacgtcgc cgaccgggag gccgtggccg 44700 

gcctgctcgc cgggattccc cgcgccgctc cgctcaccgc cgtgttccac gcggcgggcg 44760 

tgccccaggt gacgccgctt cacgagacga ccccggagtt gttcgcgcag gtctgcgcag 44820 

gcaaggtcgc cggggcggtg cacctgcacg agttggccgg tgacctggac gccttcgtca 44880 

ccttcgcctc cgccgccggg gtgtggggca gcggcgggca gtgcgcgtac gctgcggcca 44940 

acgccgccct cgacgcgctc gccgagcgtc gtcgcgccgc agggctgccc gcgacctccg 45000 

tcgcctgggg ggtctggggc gggcccggca tgggggcggg cgcgggggag gagtacctgc 45060 

gccgccgggg cgtccgggcg atgcccccgg cagccgccct cgccgccctc gggcggatcc 45120 

tggacgccga cgagaccggg gtgacggtct ccgacaccga gtggggccgg ttcgcgtccg 45180 

gcttcgccgc cgcgcgtccc gccccgctgc tcgccgagct gccgggcggg gacgtcgatc 45240 

cggccggccc ggcgcaccgg gcgcagccgc ccgtgccccg accggccccg gcagccaccg 45300 

accgccccgg gctgctggcg ctggtccgcg ccgaggccgc cggggtgctg gggcacgacg 45360 

gtgccgacga cgttccggcc gacgcggagt tctccgccct cggcttcgac tcgctcgccg 45420 

ccgtccagct gcgccgccgg ctcgccgagg ccaccggcct gagcctctcg gccccggttc 45480 

tgttcgacca ccgcacccct gacgcgctcg ccgcgcacct gcacggcctg ctcaccggcg 45540 
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cggcgggcgg gccacccgcg ccggccgccg ggagcgccct ggtcgagatg taccggcggg 45600 

ccgtcgccac cggccgcgcc gccgaggcgg tggaggtgct cggcaccgtc gccacgttcc 45660 

ggccggtgtt ccggtccccg gacgaactgg gcgagccacc ggccctcgtc ccgctcggca 45720 

ccggggcggg gggacccgcg ctggtctgct gcgcgggcac ggccgcggcg tccggccccc 45780 

gcgagttcac ggcgttcgcc gccgcgctgg ccggtctccg ggacgtcacc gtccttccgc 45840 

agaccggctt cctgcccggc gagccgctgc ccgccgggct ggacgtgctg ctcgacgccc 45900 

aggccgacgc cgtcctggcc cactgcgccg ggggaccctt . cgtcctggtc ggccactcgg 45960 

ccggggcgaa catggcgcac gcgctgacgg tccgcctgga ggcgcggggc gcggaccccg 46020 

ccgcgctggt gctgatggac atctacacgc ccgccgcccc gggggcgatg ggggtgtggc 46080 

gcgaggagat gctggcctgg gtcgccgagc ggtccgtcgt ccccgtcgac gacacgcggc 46140 

tgaccgcgat gggcgcctat caccggctgc tcctggactg ggcgccccgg ccgacccggg 46200 

cacccgtgct gcacctgtat gccggtgaac cggcgggcgc ctggccggat ccccggcagg 46260 

actggcgttc gcgcttcgac ggcgcgcaca ccagcgccga ggtgcccggc acccacttct 46320 

cgatgatgac cgagcacgcc cccgtcaccg ccgcgaccgt gcacaagtgg ctcgacgagg 46380 

tgtgcccgcc ccgcgttccg tgacccgtac gccgggtccg tcccggcgag tccgacgaca 46440 

gcaggagagg aagcgcatga tcacagtccc gcccgacggg gatcccgcga cctgggcccg 46500 

ccggctgcaa ctgacccgcg ccgcgcagtg gttcgccggc aaccacggcg acccgtacgc 46560 

gctgatcctg cgcgcggaga ccgacgaccc gaccccgtac gagcagcggg tggccgccca 46620 

gccgctgttc cgcagcgagc agttggacac ctgggtgacc ggggacgccg cgctggcccg 46680 

ggaggtgttg accgacgacc ggttcggctg gctgacccgg gctgggcagc ggcccgccga 46740 

gcggaccctg ccgctggccg gcacggcact ggaccacggg ccggaggccc ggcgtcggct 46800 

ggacgcgctc gccgggttcg gcgggccggt cctgcgggcc gacgccgcag gggcgcgtac 46860 

ccgggtcgtg gagaccaccg cggtcctgct cgacgggatc ggggagcggt tcgacctggc 46920 

cgtgctcgcc cggcggctgg tcgctgcggt gctggccgac ctgctggggg tgcccgccgc 46980 

gcggcggggc cgcttcgccg aggcactcgc cgccgccggc cgtacgctgg acagccggct 47040 

gtgcccgcag accgtggcga ccgctctcgc caccgtcgcc gccaccgccg agctgaccga 47100 

cctgctgggc gaggtgccgc ccccgccgtc gctgtccccg tccgccgccg gctccgggcc 47160. 

gccgcgtccg tccgcagccg gttcctggcc gccgctgccg gctgacgacc ggacggccgc 47220 
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cgcgctcgcg ctggcggtcg gcacggccga accggcgatc accctgctct gcaacgcggt 47280 
cggtgcgctg ctcgaccgcc ccgggcagtg ggccctgctc ggtggggacc tcgaccggtc 47340 
cgccgccgtc gtcgaggaga ccctgcgctg ccttccgccg gtgcgcctgg agagccgcgt 47400 
cgcgcagcag gacgtcaccc tgggcgggca gttcctcccg gcggacagcc acctggtcgt 47460 
gctggtcgcc atggcgaacc ggggtccgcg cgcggcgacc gccccgagcc cggacgcgtt 47520 
cgaccctggc gggtcgcgcg tcccggcccg cgacgtggtg ggcctgccgc agcttgccgg 47580 
cgccgggccg ctgatcagac tcgtcgtcac gaccgccctg cggaccctcg ccgaggcgct 47640 
gcccacgctg cggcgggcgt ccggcggcgt ccggtggcga cgctcgcccg tcctgctcgg 47700 
ccacgcccgc tttcccgtcg cacgggcgga gagcggcgaa cagcggtccg acgaccgccc 47760 
ggcgctggag gaggcgatcc gatgcgcgtc ctgatgacgt ccttcgcgca caacacccac 47820 
tactacagcc tggtgccgtt ggcctgggcg ctgcgcgcgg ccggccacga ggtacgggtg 47880 
gcgagccagc cctcgctcac cgacaccatc gtgcggtcgg ggctgaccgc ggtgccggtc 47940 
ggcgacgacc aggcgatcat cgacctgctc gccgaggtcg gcggcgacct ggtgccgtac 48000 
cagcggggac tggacttcac cgaggcccgt cccgaagtgc tgacctggga gtatctgctc 48060 
gggcagcaga ccatgctcac cgcgctgtgc ttcgcgccgc tcaacggcgt ctccacgatg 48120 
gacgacatgg tcgccctggc ccggtcctgg cagcccgagc tggtgatctg ggagccgttc 48180 
acctacgccg ggccggtcgc ggcgcgggtc gtcggtgcga cgcacgcccg gctgctctgg 48240 
gggccggacg tggtcggcaa cgcccggcgg ctgttcaccg agagcctggc gcggcagccg 48300 
gatgagcagc gcgaggaccc gatggccgag tggttgcgct gcaccctgca ccggtacggc 48360 
tgcgagctcg gcgacgacga ggtggagacc ctggtcaccg gcgggtggac catcgatccc 48420 
accgccgaca gcacccggct tcccgtcccc gggcgtcggg tggccatgcg gtacaccccg 48480 
tacaacagcc cgtccgtggt gccggagtgg gtggccaagg ccgaccggcc ccgcgtctgc 48540 
ctcaccctcg gcgtgtcgag ccgggagacg tacggcaggg acgtggtctc cttccaggag 48600 
ctgctcggcg ccctcggcga cctggacgtc gaggtcgtcg cgacgctcag cgacgcccag 48660 
cgcgaggacc tgggtgacct gccggacaac gtccgggtgt gcgacttcgt gccgctggac 48720 
gtgctgctgc cgacctgtgc cgcgatcatc caccacggcg gggcgggcac gtggtcgacg 48780 
gccatgctct acggggtgcc gcagatcatg atcgcgtcgc tgtgggacgc cccgctcaag 48840 
gcgcagcagg cggagcgact cggcacgggg atctcgatcc cgccggagcg gctcgacgcc 48900 
ccgacgctgc gggcggccgt cgtccggatc ctcgacgacc cgtcgatcgc cgccgccgcc 48960 
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cgccgtcagc gcgacgagct gcgtgccgcg ccgtcgccgg ccgaggtggt ccgcatcctg 49020 

gaacgcctcg tcgcggacga ccggcccggc cggccggccg gaaccgccac cgaccactcc 49080 

tgaaaggaac gatgtccatg atgtacgcgg acgccatcgc cgaggtctac gacctgatct 49140 

accagggcaa gggcaaggac tacgcggcgg aggcggcgga gctggaggcg ctggcccggg 49200 

cccgtcggcc gcacgcccgg acgctgctgg acgtggcgtg cggcacgggg ctgcacctgc 49260 

ggcacctggc ggggctcttc gacgacgtgg gcggcatcga gctggcaccg gacatgctga 49320 

gcatcgccca gcagcgaaac cccggggcgg ccctgcacct cggcgacatg cggaccttcg 49380 

acctggggca ccgctacgac gtcatcacct gcatgttcag ttcggtgggc cacctggcca 49440 

ccacggccga gctggacgcg acgttggccc ggttcgccgc gcacctgtcc cccgggggag 49500 

tggcgatcgt cgagccgtgg tggttcccgg agaccttcac ccccgggtac gtgggcgcga 49560 

gcctggtgga ggtcgacggc cgtaccatct cgcgggtctc ccattcggtg cgcgagggcg 49620 

gcgcgacccg gatcaccgtg cactacctcg tggccagccc cggcggggga gtccggcact 49680 

tcgacgagag ccacctgatc accctcttcg aacggtccga ctacgaacgt gccttcgccc 49740 

gggcgggttt cacgacggag tacctgacgc ccggcccgtc cggccgcggt ctgttcgtcg 49800 

gcgtccaccc ctgacgaccc gttgccggtg cgcctcgacc cgcgcccccg acccgctgga 49860 

ggaacagatg ccagacaccc ccgagctgaa ccggatactc gacgcgatcc tcgcccagga 49920 

gaccgacgcg cgggagctgg cggccctgcc gctgccctcc tcctaccggg ccgtgacggt 49980 

gcacaaggac gagacgggga tgttcctggg ccttccccgc caggagaagg acccgcgcaa 5004 0 

gtcgctgcac acggaggagg tgccggtgcc cgagctgggc cccggggagg ccctcgtcgc 50100 

ggtcctggcc agctcggtca actacaacac ggtctggtcg tcgttgttcg agccgctgcc 50160 

caccttcggc ttcctggagc gctacggccg gctctccgag ctggcccggc ggcacgacct 50220 

gccgtaccac atcctcggct cggacctggc cggcgtggtg ctgagggtcg ggcccggcgt 50280 

caaccgctgg cggccgggtg acgaggtcgt ggcgcactgc ctctcggtgg agctggagtc 50340 

cgccgacggc cacggcgaca ccatgctcga cccggaacag cggatctggg gcttcgagac 50400 

caacttcggc ggcctcgccg agatcgcgtt ggtcaaggcg aaccagctga tgcccaaacc 50460 

cgaccacctg acctgggagg aggccgccgc gccgggactg gtcaactcca ccgcctaccg 50520 

ccagctggtc tccggcaacg gggcccggat gaagcagggc gacaacgtcc tcgtctgggg 50580 

ggccagcggc ggtctcggcg cgttcgccac ccagctcgtg ctggccggcg gggccaatcc 50640 
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cgtctgcgtg gtctccagcc cgcgcaaggc cgacatctgc cgtcggatgg gcgccga^gc 50700 

cgtcatcgac cgggtcgccg aggactaccg cttctggtcc gacgagcgca cccagaatcc 50760 

ccgggagtgg aagcgcttcg gcgcacgcat tcgggagctg accggaggcg aggacgtcga 50820 

catcgtcttc gagcaccccg gccgggagac gttcggcgcc tcggtctacg tgacccgcaa 50880- 

aggaggcacc gtggtcacct gcgcctcgac gagcggtttc gagcacgtct acgacaaccg 50940 

ttacctgtgg atgtccctga agcgcatcgt cggcacgcac ttcgccaatt accgggaggc 51000 

gtgggaagcc aaccggttgg tggtcaaggg caagatccac ccgacgctgt cgcgctgcta 51060 

cccgctggag gaggtcggcc aggcggtcta cgacgtccat cacaacctgc accagggcaa 51120 

ggtcggcgtg ctcgcgctcg cgccgcgcga ggggctcggg gtccggaacc cggagctgcg 51180 

ggaatgccat cttgccgcga tcaaccgctt ccgggtgccg gcctgacggg ccgcctttga 51240 

cgcccggggg cgcggcggct ggcatgcggg cgaaccgggt gttaccgggc ggaagcaatt 51300 

ctcactgcga gtagttgcag ggtgcaccgg ctactgtgaa catatcgata gtcttatgta 51360 

gccatcgacc cccctgaatc ctctattcgt tgtgtgcgag gtggttggac gcatgactgg 51420 

taccagcatt cccccgcggg accacgaact ccgattcttc gaacttctgg ccagggaggc 51480 

acccttaccg cagtacgagg aactggtgca ccaggcgcac cgggacggag tggaccaggc 51540 

cacgctcgac cgggtgatga tcgccaagcg actcgcgttg gagcttcgag aggtcatcgg 51600 

gaggcggtgt cagcggcagg cggagctggc cgccctcgtc gacaccgccc gtgacctcgc 51660 

cggggcgacg aacctggagg ccgggctgca gctggtggtg cggcggaccc aactgctgct 51720 

cgccggggac gtggcgttcg tcagcctcgt cgacgacgcg accggcgaat cctacgtcgc 51780 

ctcggccgtc ggggcggcca ccgcgctgac cagcggctac cggctgccct ggcgcgacgg 51840 

gctggtcgtg gccgccgcac cgcgcgagcc actctcctgg acggcggacc acctcgccga 51900 

cgagcgcctc gaacgacacc cggccgccga cggcctggtc cgcgcggaag ggctgcacgc 51960 

ggtgctgtcc gtggttctga gcgtcgaggg ccggcacctc ggcaacctgc acgtcggcca 52020 

ccggcaggtc cgccacttcg ccccggacga ggtcgcgtcg ctgcgcctgc tcgccgatct 52080 

cgcggcgacg gcagtggagc ggatcatgct gctcgacgac acgtgggccg aactcaagca 52140 

ggcccagcag gaggcggcca gggcccgagc cgagctgaac gcggtccgca tggccgaccg 52200 

cctgcaaccc gaactcgtcc agctcatcct cgacggcggc gaactcgacg acctggtggg 52260 

cagcgccgtg cggcgactgg gcggcgccct gcacgtgcgt- gaccgggcca acggcgtgct 52320 

ggcggcggcc ggtgaaatcc ctgtcccgaa cgagcgggaa ctggcccgag tgcggctgaa 52380 
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cgcccacgcc accggccgac ccggccgcct gaccaccggt tcctgggtgg tgcccctggc 52440 

ggcccgcgcc ggtgacctcg gctgtgtgtt gttccacgcc gacgagccgt ccgacgacga 52500 

gcggatggcg gccctgccgg cggtcgcgca gaccgtggcg ctgctgatga ccaggaacgg 52560 

cgggagccac ggccagccgg gcgacgggct cctggaggac ctgctcggcc cgtggccgga 5262 0 

cctggagcgg ggcgggaagc gccgtcggta cacacctgtc gagttcgacc ggccctacgt 52680 

cgtcgtggtc gcccgccccg agggcgccac ctcgccccgg gtgttcgaac gggcggtctc 52740 

cgtcgcccac ggcctgaacg gcatgaaggc catccgggac ggccaggcgg tgctgctgct 52800 

gcccggtgac gacccggggg cccgggcccg ggacgtgacg cgggaactga gcgggctgct 52860 

cggcctaccg gtcacggccg gaggcgccgg accggtgcgc acggcggact cggtcagccg 52920 

cacctaccag gaggcggccc ggtgcgtcga cgccctggcc gcgctggacg cgaaggggcg 52980 

ggcggcctgc tqacgggacc tgggcttcct cgggctgctg gtcgccggcg gccacgacgt 53040 

caccggtttc gtcgaccggg tcatcggacc cgtgctgagc tacgacgcgc gccggctcac 53100 

gaatctcagg gagaccctcc agacctactt cgactcggcg ggcagccgta cccgggcggc 53160 

ggagatgctg catctgcatc cgaacaccgt gtcccgccgg ctggaccgca tctcccagct 53220 

gctcggccgg gactggcggc agccggaccg ggccctcgac acgcagctcg ctctgcgcct 53280 

gcaccggatc cgtggcctgc tctgccagga acggggctac ccgggcccat cgcaggagcc 53340 

ggaccaaccc gcgcggccta tccggcggca ccgccctcca gcatccgcag ggcgtgcgcc 53400 

acggacgcca aggtgacgtg ccggtgccag ccttggtatg accgaccctc- gaagtcttgg 53460 

atgccgacgt cgaggctgac ctgggagaag tcggtctcga cccgccgggt gagcttgctc 53520 

agccgcagca gtgggccgta cccggcgtcg gtcatgttgg tcagccacat ctgccgtacg 53580 

ccgcgctcgt aggtctgcca cttgccgagc agtgtcaggg gcagcccggg cgcggcggcg 53640 

cgcgccgccc ccggcggggc cggggcggac ggaccgggcg ggcgggcacc ggacaggccc 53700 

ggccaataga cctgtagcgg tgcgaccagg ctcgtgcgcc gtgcgccggg gctggccggg 53760 

tcgatccact ccaccggacg gcgctgggcc cgcgtcaggc tgagcaggtg ctcggcggag 53820 

gccgccgcga cccggttctc gcgcgggccg ggcccggcgg ccagcagggt gcagccgctg 53880 

ttgatccgta gcaggaaggg cagacccgcc gtggtgaacg cctcgatcag cgggggcagc 53940 

gccgagtgcc gggcgtccat taccaccggg cgagggccga ttccccaggc cgcggccttc 54000 

agcaccgcct gcaccgccgc gccgtcgctg gtcgtgccgt cctcgtccgc cggtacgctc 54060 
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gcgcgggcgc ggttgtcctg gagccaaccc ttaccgatgg acaactgcca gttgatgggc 54120 

gcggcgacgg tctccgaggc cagccagagg ccgtagctct gctggctgtt gaccgtctcg 54180 

cccagcgcgg gcacgtaccg gcgttccacg ccgaccgagt gccggccggt cttcggcacc 54240 

agcatcgacc gcaccaccca ggcccggggc gacagcgtcc ggtccaggtg gccggcgagc 54300 

gcggcacgga cggtctccca gtcccaggtg gagcaactga tgaagtggtg catgctctgt 54360 

gccgccgccg gatcgtcggc gatggcggcc aggttgcgca tggtcttgcg gccggaggcg 54420 

gtcagcagcc cccggacgta cagttcgccc ttgcgtcgct ggtcggcgcg gggcagcgag 54480 

gccagcaggg cggcgcagaa ccgggacacg gtgtccggat cggaccttgc cgcggtcacc 54540 

tcctcgcgga cgtcgagcgt cggcaccatc actcccctcc tgcggcggga cgatgtgctg . 54600 

atcacggcag acccggcccc ccggtcccac catcgcccgg cgacgcctgc cttgcccagg 54660 

tgcgtcggaa acacacttgg cgacgacggc gatcccgcac ccaccgcagc cccgccggtg 54720 

cgtgtccgtg gcgggcgggc gcgggccggc gacccggtga cgcccgcaca catcgcggct 54780 

tcggccgcgg cgaagtgtgt gaccggcgaa cctcgcttcc cgcgccgcca tccggaagcc 54840 

tgcaagggac cggaagcctt ccaacgagat tggcatcccc ccggcaaagg acccagatga 54900 

cctccgcagc gcaccattcc ccgcatccgg cgaaggccga cgccctgatg gacgacgccc 54960 

acgccgacat cggggccgat gccgaggccg acggfccgacg gctcgaccgg gccgccctgc 55020 

ggcgggtcgc cgggctgtcg accgagaggg ccgacgtcac ggaggtcgag taccggcagg 55080 

tgcggctgga gcgcgtcgtc ctggtcggcg tgtggacctc gggcaccgcc gacgaggccg 55140 

aacggtccct cgccgagctg gcggcactcg ccgagaccgc gggagccgtg gtgctcgacg 55200 

gggtgatcca gcgccgcgac cggcccgacc cggcgacgta catcggctcc ggcaaggcgc 55260 
gggagttgcg ggacatcgtc caggaggtgg gggccgacac ggtgatctgc gacggtgagc • 55320 

tgagcccggc ccaactggta cgcctcgaag aggtcgtcga cgccaaggtg gtggaccgca 55380 

ccgcgctgat cctcgacatc ttcgcccagc acgccacgtc ccgcgagggg aaggcgcagg 55440 

tggccctggc acagatgcaa tacatgctgc cgcggctgcg cggctggggc cagtcgctct 55500 

cccggcagat gggcggaggt gccggcggcg gtggcatggc cacccggggg cccggcgaga 55560 

ccaagatcga gaccgaccgg cggcgcatcc acgagaggat ggcccggctc cgacgggaga 55620 

tcgcggagat gaagtccggc cgcgaactca agcgccgcga tcggoggcgc aacagcgtcc 55680 

cgtcggtcgc gatcgccggt tacaccaacg ccggcaagtc ctcgctgctc aaccggctca 55740 

ctggcgcgag cgtgctggtg cagaacgcgc tgttcgccac cctcgacccg acggtgcgcc 55800 
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gggccaccac 


cccgagcggg 


cgcagctaca 


cgatcaccga 


caccgtcgga 


ttcgtccggc 


55860 


acctgccgca 


ccacctggtg 


gaggcgttcc 


gctccaccct 


ggaagaggtg 


gccgaggccg 


55920 


acctcctgct 


gcacgtggtg 


gacggcgccc 


accccgcccc 


gctggagcag 


ctcgcctcgg 


55980 


tgcgcgcggt 


catccgggac 


gtggacgcgg 


cgggagtgcc 


cgaactcgtc 


gtgatcaaca 


56040 


aggccgacgc 


cgccaccccg 


gccgccctgg 


ccgcgttggc 


ggaggccgag 


ccgcaccacg 


56100 


tcgtcgtctc 


ggcccgcacc 


ggtcagggca 


tcgacacgct 


tcggcagttg 


ctggaggccg 


56160 


cgctgccgca 


ccgggaggtc 


cgggtcgacg 


tcctgatccc 


gtacgtcgcg 


ggcagcctcg 


56220 


tggcccgggt 


gcacgccgac 


ggcgaggtgc 


tggccgagga 


gcacacggcc 


gacggcaccc 


56280 


tgctgcaggc 


gcgggtggcc 


cccgacctgg 


ctgccgagct 


cagcgcgtac 


gccaggacct 


56340 


gagcgtcgcc 


gccccccggg 


cggcatccgg 


agctggcgaa 


gctgtggccc 


gtagagggag 


56400 


gcaggcgatg 


aagcgagatc 


tcggggatct 


ggcactcttc 


ggaggacacg 


ccagcttcct 


56460 


ccagcagatc 


cacgtcgggc 


gccccaaccg 


gatcgatcgg 


gccaggctgt 


tcgaccggct 


56520 


gtcctgggcg 


ctcgacaacg 


agtggttgac 


caacaacggg 


ccgctggcac 


gggagttcga 


56580 


ggagcgggtc 


gccgacatgg 


tcggggtcgg 


caactgcgtg 


gcgacgtgca 


acgccacggt 


56640 


ggccctccag 


ctgctcgcgc 


acgccaccga 


gctgaccggt 


gaggtgatca 


tgccatcgct 


56700 


caccttcgcc 


gcgaccgcac 


acgcggtgcg 


ctggctcggg 


ctggagccgg 


tcttctgcga 


56760 


catcgacccg 


cgcaccggat 


gcctcgacca 


cgtggcggtc 


gccgcggcca 


tcacgccgcg 


56820 


cacgtcggcg 


gtcttcggcg 


tccacctctg 


gggccgcccc 


tgcgacgtca 


acgcgctgga 


56880 


gaaggtgacc 


gccgacgcgg 


gcctgcgcct 


gttcttcgac 


gccgcccacg 


ccatcgggtg 


56940 


cacctcacag 


ggccgcccgg 


tggggcggtt 


cggccacgcc 


gaggtgttca 


gcttccacgc 


57000 


gacgaaggtc 


gtcaacgcct 


tcgagggcgg 


ggcgatcgtc 


accgacgacg 


acgacctcgc 


57060 


ccaccgcgtc 


cgctccctgg 


cgaacttcgg 


cttcggcctg 


cacagcccca 


gcgcggccgg 


57120 


cggcaccaac 


gcgaagatga 


gcgaggcgtc 


cgccgccatg 


gggctcacct 


cgctcgacgc 


57180 


gttccccgag 


gtggcccgcc 


acaaccaggc 


caactacgag 


cagtactgcg 


gtgagctggc 


57240 


ccggattccc 


ggcctcagcg 


tgatcgactt 


cgcccccgac 


gagcggcaca 


actaccagta . 


57300 


cgtgatcgtc 


gagatcgacc 


cggacgtcac 


cgggttgcac 


cgcgacctgc 


tcgtcgacct 


57360 


gctccgggcc 


gagaacgtcg 


tggcgcagcg 


ctacttctcg 


ccggcctgtc 


accaattgga 


57420 


gccctaccgg 


tcccggcagc 


agttccagct 


gccgcacacc 


gagcggctct 


cggcgcgcgt 


57480 
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cctggcgctg ccgaccggct ccgccatctc ccgggaagac atccgcaggg tgtgcaacat 57540 
cgtgcggttg gcggtctccc ggggattcga attgaccgct cggtggcagc agcagcccgg 57600 
gcccgacgga cagagcgtgg tggcacccgg ttgaccgaac ggcaccggac ggacgtgtgg 57660 
gagggcccgt gaccatggag atctccgcct cgaatcccgt ggcgacctgc gctgtccccg 57720 
gcagcgaccc gaccgcggcg gcgcgcgtgc tgtacgacga ggtcgccggg tcaggaatcg 57780 
tgccgccggc agagatcggg gccgccgccc aggggttggt ggcattggca cgcatctacg . 57840 
ggaccacacc ttttctgccg cttgagcagg cccgccgcga aatcggcctg gaccgggccg 57900 
ggttcgggcg gctgctggac ctgttcgccc ggattcccgg gttgcgcacc gcagtggaga 57960 
acggaccgtc cggtcgctac tggaccaaca cggtgctcgg cctcgaaagg gccggcgtct 58020 
tcgacgccgt gctcgaccgg aggccggcgt ttccgcatct cgtcgggctc tacccgggcc 58080 
ccacgtgcat gttccgctgt cacttctgcg taagggtcac cggggcccgc taccaggcct 58140 
cggcgctgga cgacgggaac gccatgttcg cctctgtcat cgacgaggtc cccgcgcaca 58200 
accgcgacgc ggtgtacgtc tccggtggcc tcgagccact caccaacccc gggctcggtg 58260 
cactggtcag ccgggcggcc gagcggggat ttcggatcat cctctacacc aactcgttcg 58320 
ccctcacgga gcagaagctc aagggtgagc ggggattgtg gagcctgcac gccatccgca 58380 
cgtcgctgta cgggttgaac gacgaggaat accgggcgac caccggcaag cagggggcct 58440 
tcacccgggt acgggcgaac ctcacgcggt tccagcagct gcgtgccgag cggggcgagc 58500 
cggtgcggct cggcctcagc tacatcgtcc tgcccggccg cgccgggcgg ctgagcgcgc • 58560 
tgatcgactt cgtcgccgag ctcaacgagg cggcaccgga ccgcccgctg gactacatca 58620 
acctgcggga ggactacagc gggcggccgg acgggaagct ctccctggac gagcgcgccg 58680 
agctccaggc cgagctgcac cggttccggg agagggcaat gcagcggacg ccgaccctgc 58740 
acatcgacta cggctacgcc ctgcacagcc tgatgacggg aagcgacgtg gagctcgtgc 58800 
gtatccggcc ggagacgatg cgccctgcgg cccacccgca ggtgtcggtg caggtggata 58860 
tcctcggtga tgtctacctc tatcgggagg cggcgtttcc gggcctggcc ggtgccgadc 58920 
gctatcgcat cggcacggta tctcccggca cgacgttggc gcaggtggtg gagacgttcg 58980 
tgaccagcgg cggatcggtg gtcgcgaagc ctggcgacga atacttcctg gacggattcg 59040 
accaggcggt gaccgcgcgg ctgaaccaga tggagaccga cgtcgccgat ggctggggag 59100 
accgacgggg tttcctccgc tgatggagat cgactggtga gagcgggtgg ccaacgccga 59160 
agaaagccag ttgccggtgg cccgcaccgc cgtttcagtc gtcgggtata gtgcccgtca 59220 
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tggctgttgt 


gtgcttcatg 


aggctccgcc 


gcgcatagcg gcggaccatc 


gcttctcttg 


59280 


atgagtgtcg 


ccgcccatcg 


ggtcactgcc 


ggtgcggcgt tccctgccga 


ccggctccga 


59340 


acgatattcg 


cggagcacgc 


acatgcccta 


catccagcac gccgggcgac 


atgaattcgg 


59400 


ccagaatttc 


ctggtcgacc 


gctcggtgat 


cgacgatttc gtcgaactcg 


tcgcccggac 


59460 


cgacggccct 


atcgtggaga 


tcggcgccgg 


cgacggtgcg ctgaccctac 


ccctgagccg 


59520 


gcagggaagg 


gagttgaccg 


cagtggagat 


cgactccaag cgttccaagc 


ggctcagccg 


59580 


gcagacaccc 


gacaacgtca 


ccgtggtctg 


cgcggatgtc ctgagcttcc 


ggttccccca 


59640 


gcatccgcac 


gtggtcgtcg 


ggaacatccc 


cttccacgtg accaccccca 


tcgtgcgggc 


59700 


tctcctcgcc 


gcggaccact 


ggcacacggc 


ggtgctgctg gtgcagtggg 


aggtggcccg 


59760 


caggcgggcc 


ggcgtcggcg 


gcgcgacgct 


gctgaccgcg agctggtggc 


cctggtacga 


59820 


cttcgaactg 


cactcccggg 


ttccggcccg 


cgccttccgg cctgtccctt 


ccgtcgacgg 


59880 


cgggctgttc 


tccatggtcc 


gtcgcgggac 


cccgctggtc gacgaccgga 


ggggttacca 


59940 


ggaattcgtc 


cggctggtgt 


tcaccggcaa 


ggggcacgga ttgccggaga 


tccttcagcg 


60000 


gaccgggcgg 


atcgcccgca 


aggaccagca 


ggactggcaa cgggccaacc 


gggtggggcc 


60060 


gcagcacctg 


cccaaggacc 


tgaccgccca 


ccagtgggcc tccctgtggc 


acctggtggc 


60120 


acccgcccgg 


ccggccggcc 


cccgccgtcc 


ggcaccgcgc cggccaggaa 


gccccgcttc 


60180 


ggcgcgccgg 


cgctga 








60196 


<210> 2 
<211> 560 
<212> PRT 

< 2 1 3 > mi cromonospor a 


carbonacea 


siibspecies aurantiaca 







<400> 2 

Val Pro Val Pro Thr Gin Glu Ala Pro Leu Arg Asn Ser Pro Pro Pro 
15 10 15 



Ala His Ser Gin Leu Val Leu Ser Glu Val Thr Lys His Tyx Ala Glu 
20 25 30 

Arg Val Val Leu Asp Arg Val Ser Leu Thr Val Lys Pro Gly Glu Arg 
35 40 45 

Val Gly Val lie Gly Glu Asn Gly Ser Gly Lys Ser Thr Leu Leu Arg 
50 55 60 

Leu Val Ala Gly Leu Glu Thr Pro Asp Asn Gly Glu Leu Thr Val Ser 
65 70 75 80 
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Ala Pro Gly Gly He Gly Tyr Leu Ala Gin Arg Leu Arg Leu Pro Ala 
85 90 95 

Gly Gly Ser Thr Val Arg Asp Val Val Asp His Thr Leu Ala Asp Leu 
100 105 . no 

Arg Asp Leu Glu Ala Arg Leu Arg Ala Ala Glu Ala Asp Leu Ala Thr 
115 120 125 

Ala Thr Pro Glu Gin Leu Asp Ala Tyr Gly Thr Leu Leu Thr Val Phe 
130 135 140 

Glu Ala Arg Gly Gly Tyr Gin Ala Asp Ala Arg Val Asp Ala Ala Leu 
145 150 155 160 

His Gly Leu Gly Leu Ala Glu Leu Asp Arg Asp Arg Asp Val Asp Thr 
165 170 175 

Leu Ser Gly Gly Glu Arg Ser Arg Leu Ala Leu Ala Ala Thr Leu Ala 
180 185 190 

Ala Ala Pro Glu Leu Leu Leu Leu Asp Glu Pro Thr Asn Asp Leu Asp 
195 200 205 

He Glu Ala Val Glu Trp Leu Glu Asp His Leu Arg Ser His Arg Gly 
210 215 220 

Thr Val Val Val Val Thr His Asp Arg Val Phe Leu Glu Ser Val Thr 
225 230 235 240 

Ser Tlir He Leu Glu Val Asp Thr Asp Thr Arg Ala Val His Arg Tyr 
245 250 255 

Gly Asp Gly Tyr Ala Ser Tyr Leu Arg Ala Lys Ala Ala Leu Arg Glu 
260 265 270 

Ser Arg Glu Arg Ala Tyr Ala Glu Trp Val Ala Glu Val Glu Arg Gin 
275 280 285 

Ser Gin Leu Ala Glu Arg Ala Gly Thr Met Leu Arg Ser He Ser Arg 
290 295 300 

Lys Gly Pro Ala Ala Phe Ser Gly Ala Gly Ala His Arg Ser Arg Ser 
305 310 315 320 

Ser Ser Thr Ala Thr Ser Arg Lys Ala Arg Asn Ala Asn Glu Arg Leu 
325 330 335 

Arg Arg Leu Arg Glu Asn Pro Val Pro Arg Pro Ala Asp Pro Leu Arg 
340 345 350 

Phe Thr Ala Ser Val Ala Pro Asp Ala Thr Asp Ala Asp Thr Arg Arg 
355 360 365 



Val Glu Leu Thr Asp Val Arg Val Gly Arg Arg Leu His Val Pro Glu 
370 375 . 380 
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Leu Thr lie Gly Pro Ala Glu Arg Leu Leu Val Thr Gly Pro Asn Gly 
385 390 395 400 

Ala Gly Lys Ser Thr Leu Met Arg Val Leu Ala Gly Glu Leu Val Pro 
405 410 415 

Asp Gly Gly Thr Val Arg Leu Pro Ala Arg lie Gly His Leu Arg Gin 
420 425 430 

Asp Val Thr Val Gly Gin Pro Gly Arg Ser Leu Leu Glu Thr Tyr Ala 
435 440 445 

Ser Gly Arg Pro Gly His Pro Glu Glu Tyr Ala Glu Glu Leu Leu Ala 
450 455 460 

Arg Gly Leu Phe Arg Pro Asp Asp Leu Arg Met Pro Val Gly Thr Leu 
465 470 475 . 480 

Ser Val Gly Gin Arg Arg Arg lie Asp Leu Ala Arg Leu Val Ala Arg 
485 490 495 

Pro Ala Asp Leu Leu Leu Leu Asp Glu Pro Thr Asn His Phe Ala Pro 
500 505 510 

Leu Leu Val Glu Glu Leu Glu Gin Ala Leu Asp Gly Tyr Ala Gly Ala 
515 520 525 

Leu Val Val Val Thr His Asp Arg Arg Met Arg Ser Thr Phe Thr Gly 



<210> 3 
<211> 1683 
<212> DNA 

<213"> micromonospora carbonacea subspecies aurantiaca 
<400> 3 

gtgccagttc cgacacagga ggcccccttg cggaacagcc cgccgccagc ccattcgcag 60 
ctcgtcctga gcgaggtcac- gaagcactac gccgagcggg tcgtcctgga ccgcgtttcg 120 
ctcaccgtca agccggggga gcgggtcggc gtcatcggcg agaacgggtc ggggaagtcg 180 
accctgctgc ggctcgtcgc ggggctggag acgccggaca acggcgagtt gaccgtctcg 240 
gcgcccgggg gcatcggcta tctcgcccag cggcttcggc tgccggccgg cggcagcacc 300 
gtacgggatg tggtggacca cacgctcgcc gacctgcgag acctggaggc gcggttgcgc 360 
gccgccgagg cggacctggc caccgccacg cccgagcagt tggacgccta cggcacgctg 420 
ctcactgtgt tcgaggcccg pggcggctac caggccgacg cccgggtgga cgccgccctg 480 
cacggtctcg gcctggccga gctcgaccgc gatcgcgacg tcgacacgct ctccggcggg 540 



38 - 



wo 03/010193 



PCT/CA02/01177 



gaacggtccc ggctcgcgct cgccgcgacc ctggccgccg cgccggaact gctgctgctc 600 

gacgagccca ccaacgacct cgacatcgag gccgtggagt ggctggagga tcacctgcgg 660 

tcgcaccggg gcaccgtcgt cgtggtcacfc cacgaccggg tgttcctgga gtcggtcacg 720 

tccaccatcc tcgaggtcga caccgacacc cgggccgtgc accggtacgg cgacggctat 780 

gccagctacc tgcgggccaa ggccgccctc cgggagagcc gggagcgcgc gtacgcggaa 840 

tgggtggccg aggtcgagcg gcagtcccaa ctcgcggagc gggccgggac gatgctccgg 900 

tcgatctccc gcaagggacc ggctgcgttc agcggggccg gtgcccaccg ctcccggtcg 960 

tcgtcgacgg cgacgtcacg caaggcccgc aacgccaacg agcggcttcg ccggctgcgg 1020 

gagaatccgg taccgcgacc cgccgacccg ttgcgcttca ccgcgtcggt cgccccggat 1080 

gccacggacg ccgatacccg ccgcgtcgag ttgaccgacg tccgggtggg ccgccgcctg 1140 

cacgtgcccg agctgaccat cggacccgcc gaacggttgc tggtgaccgg acccaacggc 1200 

gcgggtaaga gcaccctgat gcgggtgctc gccggggaac tcgtgcccga cggcggaacg * 1260 

gtgcggctgc cggctcggat cggccacctg cgtcaggacg tgacggtcgg gcagcccggg 1320 

cgctctctgc tggagacgta cgcgtcgggt cggccggggc atcccgagga gtacgcggag 1380 

gagttgctcg cccgcggtct gttccggccc gatgacctgc gcatgccggt cgggacgctc 1440 

tccgtcgggc agcgccgccg gatcgacctg gcccggctgg tcgcccgccc ggccgacctg 1500 

ctgctgttgg acgagcccac caaccacttc gcgcccctgc tcgtggagga gctggaacag 1560 

gcgctggacg gctacgccgg agcgctggtc gtggtgacgc acgaccggcg gatgcggagc 1620 

accttcaccg gggctcggct ggaactgcac cagggcgtgg ccaccggggc gagccgggcc 1680 

1683 

<210> 4 
<211> 264 
<212> PRT 

<213> micromonospora carbonacea subspecies aurantiaca 
<4O0> 4 



Pro Val Asn Asp Pro Ala Val Arg Leu Phe Cys Phe Pro His Ala Glv 
20 .25 30 

Gly Ala Ala Ser Ala Tyr Leu Pro Phe Ala Arg Arg Leu Ala Ala Asp 
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Val Asp Val Leu Ala Val Gin Tyr Pro Gly Arg Gin Asp Arg Arg Gly 
50 55 '60 

Glu Pro Leu lie Glu Ser Val Asp Ala Leu Val Asp Gly Leu Leu Pro 
65 70 75 80 

Ala Leu Leu Ala Trp Ala Asp Arg Pro Val Ala Phe Phe Gly His Ser 
85 90 95 

Met Gly Ala Thr Val Ala Phe Glu Ala Ala Arg Arg Leu Pro Pro Ala 
100 105 110 

Asp Ala Asp Arg Leu Val His Leu Phe Ala Ser Gly Arg Arg Ser Pro 
115 120 125 

Ser Val Gly Arg Arg Asp Arg Phe Tyr Arg Phe Asp Asp Glu Leu He 
130 135 140 

Asp Glu He Arg Arg Leu Gin Gly Thr Asp Ser Ser Leu Leu Asp Asp 
145 150 155 160 

Arg Glu Leu Leu Asp Met Leu Leu Pro Ala He Arg Asn Asp Tyr Arg 
165 170 175 

Ala Ala Ala Ala Tyr Glu Tyr Arg Pro Gly Pro Arg Leu Arg Cys Pro 
180 185 190 

Val Thr Val Leu Ala Gly Ala Ala Asp Thr His Val Thr Thr Asp Glu 
195 200 205 

Ala Ala Ala Trp Ala Glu Val Thr Ala Ala Ala Thr Met Val Arg Thr 
210 215 220 

Phe Pro Gly Gly His Phe Tyr Leu Asn Asp Gin Leu Asp Ala Val Cys 
225 230 235 240 

Ala Glu Val Thr Tlir Thr Leu Ala Ala Val Ser Thr Thr Ala Leu Thr 
245 250 255 

Ala Val Pro Gly Ala Asp Pro Gly 
260 

<210> 5 
<211> 795 
<212> DNA 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 5 

atgtccccgt ccgccgatcc gtccgagctg tggctacgcc gctaccggcc cgtcaacgac 60 
cccgccgtcc ggctgttctg cttcccgcac gccgggggcg cggccagcgc gtacctgccg 120 

ttcgcccgcc ggctcgccgc cgacgtggac gtgctggcgg tccagtaccc gggccggcag 180 
gaccgccgcg gcgaaccctt gatcgagtcc gtcgacgccc tggtggacgg gctcctgccc 240 
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gcactgctcg cctgggcgga ccgaccggtg gccttcttcg gtcacagcat gggcgccacg 300 

gtggccttcg aggccgcccg ccggptccca ccggccgacg ccgatcggct cgtgcacctc 360 

ttcgcctccg gccgccgtag cccgtccgtc gggcggcggg accggttcta ccggtttgac 420 

gacgagctga tcgacgagat ccgccggctc cagggcaccg attccagcct cctggacgac 480 

agggaactgc tggacatgct cctccccgcc atccgcaacg actaccgggc cgccgccgcc 540 

tacgaatacc ggccagggcc caggctgcgt tgcccggtca ccgtactcgc cggggccgcc 600 

gacacGcacg tcaccaccga cgaggccgcg gcgtgggccg aggtgaccgc agcggccacg 660 

atggtccgca cgttcccggg cgggcacttc tatctcaacg atcagctcga cgctgtgtgc 720 

gccgaggtca cgaccaccct cgcagcggtg tccacgaccg ccctcacggc ggtgccgggt 780 
gccgaccccg gctga 

<210> 6 
<211> 410 
<212>' PRT 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 6 

Met Thr Gin Thr Pro Asn Ala Pro Ala Gly Pro lie Asp Leu Pro Lys 
1 5 10 15 

Gly Ala Asp Ala Gin Gly Leu Leu Asp Trp Phe Ala Tyr Met Arg Lys 
20 25 30 

Asn Trp Pro Val Ser Trp Asp Glu Thr Arg Gin Ala Trp His Val Phe 
35 40 45- 

Ser Tyr Arg Asp Tyr Gin Thr Val Thr Thr Asn Pro Leu lie Phe Ser 

50 55 60 • 

Ser Asp Phe Thr Ser Val Phe Pro Val Pro Ser Glu Leu Ala Leu Leu 
^5 70 75 80 

Met Gly Pro Gly Thr He Gly Gly He Asp Pro Pro Arg His Ala Pro 
85 90 95 

Leu Arg Lys Leu Val Ser Gin Ala Phe Thr Pro Arg Arg He Ala Gin 
100 105 110 

Met Glu Leu Arg He Gly Gin He Thr Ala Asp Val Leu Asp Gin Val 
115 120 125 

Arg Asp Gin Asp Arg He Asp He Ala Ser Asp Leu Ala Tyr Pro Leu 
130 135 140 

Pro Val Thr Val He Ala Glu Leu Leu Gly He Pro Thr Lys Asp His 

150 155 160 
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Glu Lys Phe Arg Glu Trp Val Asp lie lie Leu Ser Asn Glu Gly Leu 
165 170 175 

Glu Tyr Pro Asn Leu Pro Asp Asp Phe Thr Glu Thr Val Gly Pro Ala 
180 185 190 

lie Glu Glu Trp Ser Glu Phe Leu Tyr Ala Gin lie Ala His Lys Arg 
195 200 205 

Ala Glu Pro Lys Asp Asp Leu lie Ser Gly Leu Cys Ala Ala Glu Val 
210 215 220 

Asp Gly Arg Lys Leu Thr Asp Glu Glu Val Val Asn lie Val Ala Leu 
225 230 235 240 

Leu Leu Thr Ala Gly His lie Ser Ser Ala Thr Leu Leu Ser Asn Leu 

245 250 255 

Phe Leu Val Leu Glu Glu His Pro Gin Ala Gin Ala Ala Val Arg Ala 
260 265 270 

Asp Arg Ser Leu Val Pro Gly Val lie Glu Glu Thr Leu Arg Tyr Arfcf 
275 280 285 

Ser Pro Phe Asn Cys lie Phe Arg lie Leu Asn Glu Asp Thr Asp lie 
290 295 300 

Leu Gly His Pro Met Arg Lys Gly Gin Met Val He Ala Trp He Ala 
305 310 315 320 

Ser Ala Asn Arg Asp Thr Glu Val Phe Thr Asp Pro Asp Thr Phe Asp 
325 330 335 

He Arg Arg Glu Ser Asn Lys His Leu Ala Phe Gly His Gly He His 
340 345 350 

His Cys Leu Gly Ala Phe Leu Ala Arg Leu Glu Ala Lys Val Phe Leu 
355 360 365 

Asn Gin Thr Leu Asp Gin Phe Thr Glu Phe Arg He Asp His Val Gly 
370 375 380 

Val Glu Phe Tyr Asp Ala Asp Gin Leu Thr Ala Arg Arg Leu Pro Val 
385 390 395 400 

Gin Val Val Arg Asp Gly Arg His Pro Lys 
405 410 

<210> 7 
<211> 1233 
<212> DNA 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 7 

atgacgcaga ccccgaacgc cccggcggga ccgatcgacc tgcccaaggg cgccgacgcc 60 
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caggggctgc tggactggtt cgcgtacatg cggaagaact ggcccgtctc ctgggacgag 120 

acccgtcagg cctggcacgt gttctcctac cgggactacc agaccgtgac caccaacccg 180 

ctgatcttct cgtcggactt cacctcggtc tttcccgtac cgtcggagct ggccctgctg 240 

atgggccccg gcaccatcgg cggcatcgac ccgccgcggc acgcgccgct gcgcaagctg 300 

gtgagccagg cgttcacccc ccgccggatc gcccagatgg agctgcggat cgggcagatc 360 

accgccgacg tgctcgacca ggtacgcgac caggaccgga tcgacatcgc cagcgacctc 420 

gcgtacccgc tgccggtgac ggtcatcgcc gagctgctcg gcattcccac caaggatcac 480 

gagaagttcc: gcgagtgggt ggacatcatc ctcagcaacg aagggctgga gtatcccaac 540 

ctcccggacg acttcaccga gacggtgggc cccgccatcg aggagtggtc cgaattcctg 600 

tacgcccaga tcgcccacaa gcgcgccgaa ccgaaggacg acctgatcag cggcctctgt 660 

gcggcggagg tcgacgggcg caagctgacc gacgaggaag tcgtcaacat cgtcgcgctg 720 

ctgctcaccg ccgggcacat ctccagcgcc acgctgctca gcaacctgtt cctggtgctg 780 

gaggagcacc cgcaggcaca ggccgcggtc cgcgccgacc gcagcctcgt gccgggcgtg 840 

atcgaggaga cgctgcgcta ccggtccccg ttcaactgca tcttccggat cctgaacgag 900 

gacaccgaca tcctcggcca ccccatgcgc aagggccaga tggtgatcgc ctggatcgcc 960 

tccgcgaacc gcgacaccga ggtgttcacg gacccggaca ccttcgacat ccgacgcgag 1020 

tcgaacaagc acctggcgtt cggccacggc atccaccact gcctgggcgc gttcctggcc 1080 

aggctggagg cgaaggtctt cctcaaccag acgctcgacc agttcaccga gttccggatc 1140 

gaccacgtcg gggtcgagtt ctacgacgcc gaccagctca ccgcgcgacg cctccccgtc 1200 

caggtggtac gcgacggacg gcacccgaag taa 1233 

<210> 8 
<211> 402 
<212> PRT 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 8 

Met Glu His Pro Val Thr Ala Gly Ser Cys Arg Phe Tyr Pro Phe Ser 
15 10 15 

Asp Arg Thr Asp Leu Asn lie Asp Pro Thr Tyr Gly Glu Leu Arg Ser 
20 25 30 

Lys Glu Pro Val Ala Arg Val Arg Met Pro Tyr Gly Gly Asp Ala Trp 
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Leu Val Thr Arg His Ala Asp Ala Lys Lys Ala Leu Ser Asp Pro Arg 
50 55 60 

Leu Ser lie Ala Ala Gly Ala Gly Arg Asp Val Pro Arg Ala Ser Pro 
65 70 75 80 

Arg Leu Gin Glu Pro Asp Gly Leu Met Gly Leu Pro Pro Asp Ala His 
85 90 95 

Ala Arg Leu Arg Arg Leu Val Ala Thr Ala Phe Thr Pro Lys Arg Val 
100 105 110 

Arg Asp He Ala Pro Arg Val Val Gin Leu Ala Asp Lys Leu Leu Asp 
115 120 • 125 

Asp Val Val Glu Thr Gly Pro Pro Ala Asp Leu Val Gin Gin Leu Ala 
130 135 140 

Leu Pro Leu Pro Val Met He He Cys Glu Met Met Gly He Gly Tyr 
145 150 155 160 

Asp Glu Gin His Leu Phe Arg Ala Phe Ser Asp Ala Leu Met Ser Ser 
165 170 175 

Thr Arg Tyr Thr Ala Asp Gin Val Asp Arg Ala Val Glu Asp Phe Val 
180 185 190 

Glu Tyr Leu Gly Gly Leu Leu Ala Gin Arg Arg Ala His Arg Thr Asp 
195 200 205 

Asp Leu Leu Gly Ala Leu Val Glu Ala Arg Asp Asp Gly Asp Arg Leu 
210 215 220 

Thr Glu Asp Glu Leu Val Met Leu Thr Gly Gly Leu Leu Val Gly Gly 
225 230 235 240 

His Glu Thr Thr Ala Ser Gin He Ala Ser Gin He Phe Leu Leu Leu 
245 250 255 

Arg Asp Arg Thr Arg Tyr Glu Gin Leu His Ala Arg Pro Glu Leu He 
260 265 270 

Pro Thr Ala Val Glu Glu Leu Leu Arg Val Ala Pro Leu Trp Ala Ser 
275 280 285 

Val Gly Pro Thr Arg He Ala Thr Glu Asp Leu Glu Leu Asn Gly Thr 
290 295 300 

Thr He Arg Ala Gly Asp Ala Val Val Phe Ser Leu Ala Ser Ala Asn 
305 310 315 320 



Gin Asp Asp Asp Val Phe Ala Asn Ala Ala Asp Val Val Leu Asp Arg 
325 330 335 

Asp Pro Asn Pro His He Ala Phe Gly His Gly Pro His Tyr Cys He 

340 345 350 
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Gly Ala Ser Leu Ala Arg Leu Glu He Gin Ala Ala He Gly Ala Leu 
355 . 360 365 

Ala Arg Arg Leu Pro Gly Leu Arg Leu Ala Val Glu Glu Asn Glu Leu 
370 375 380 

Asp Trp Asn Lys Gly Met Met Val Arg Ser Leu Val Ser Leu Pro Val 
385 390 395 400 

Thr Trp 

<210> 9 
<211> 1209 
<212> DNA 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 9 

atggagcatc cagtaacggc cgggtcctgc aggttctacc ccttcagtga ccgtaccgac 60 

ctgaatatcg atcccacgta cggcgaactg cgctcgaaag agccggtcgc ccgcgtccgc 120 

atgccctacg gcggggacgc ctggctggtc acccggcacg ccgacgccaa gaaggccctc 180 

tctgaccccc gactcagcat tgcagccgga gccgggcggg acgtgccgcg cgcctccccc 240 

cgtctccagg aacccgacgg tctgatgggt cttccccccg acgcgcacgc ccgactgcgc 300 

aggctcgtcg ccacggcgtt cacgccgaag cgcgtacggg acatcgcccc gcgcgtcgtc 360 

cagctcgccg acaagcttct cgacgacgtg gtcgaaaccg ggccgccggc cgacctcgtg 420 

cagcagctcg cgcttcccct gccggtgatg atcatctgcg agatgatggg catcgggtac 480 

gacgagcagc acctgttccg tgccttcagc gatgccctga tgtcctccac ccgatacacg 540 

gccgaccagg tcgaccgcgc ggtagaggac ttcgtcgagt acctcggcgg cctcctcgcg 600 

cagcgccgtg cacaccgcac cgacgacctc ctcggcgccc tggtcgaggc gcgagacgac 660 

ggcgatcggc tgaccgagga cgaactcgtc atgctcaccg gcggcctgct cgtcggcggc 720 

cacgagacga ccgccagcca gatcgcctcg cagatcttcc tcctgctgcg cgaccggacc 780 

aggtacgagc aactccatgc ccgtccggag ttgatcccca cggcagtcga ggaactgctg 840 

cgggtggccc cgctctgggc ctcggtcggc cccacccgca tcgccaccga ggacctggaa 900 

ctcaacggga cgaccatccg ggccggcgac gccgtcgtct tctcgctggc gtccgccaat 960 

caggacgacg acgtcttcgc gaatgccgca gacgtcgtgc tcgaccgcga cccgaatccg 1020 

cacatcgcct tcgggcacgg gccccattac tgcatcgggg cgtcactggc cagactggaa 1080 

atacaggccg ccatcggcgc cttggccagg cggcttcccg gtctccgcct ggccgtcgag 1140 

gaaaacgaac ttgattggaa caagggaatg atggtacgca gcctcgtgtc ccttccggtg 1200 
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acgtggtga 1209 



<210> 10 
<211> 4471 
<212> PRT 

<213> micromonospora carbonacea subspecies aurantiaca 

<400> 10 

Met Arg Val Val Gly Ala Asp Ala Cys Ser Ala Ala Val Pro Ala Gly 
15 10 15 

Pro Arg Met Gly Phe Pro Ala Ser Phe Phe Asp Pro Gly Asp Leu Met 
20 25 30 

Thr Val Gin Ser Asp Val Leu Arg His Arg Asp lie Ala Val lie Gly 
35 40 45 

Met Ser Cys Arg Leu Pro Gly Ala Pro Ser He Glu Glu Phe Trp Asp 
■ 50 55 60 

Leu Leu Cys Ser Gly Arg Ser Ala Val Asp Arg Gin' Pro Asp Gly Gly 
65 70 75 80 

Trp Arg Ala Val He Asp Gly Lys Gly Glu Ser Asp Ala Ala Phe Phe 
85 90 95 

Gly Met Ser Pro Arg Gin Ala Ala Ala Val Asp Pro Gin Gin Arg Leu 
100 105 110 

Met Leu Glu Leu Gly Trp Glu Ala Leu Glu Asn Ala Arg He Arg Pro 
115 120 125 . . 

Ala Asp Leu Lys Gly Ser Asp Thr Gly Val Phe Val Gly Leu Thr Ala 
130 135 140 

Asp Asp Tyr Ala Thr Leu Leu Arg Arg Ser Gly Thr Pro He Ser Gly 
145 150 155 160 

His Thr Ala Thr Gly Leu Asn Arg Ser Leu Thr Ala Asn Arg Leu Ser 
165 170 175 

Tyr Leu Leu Gly Leu Arg Gly Pro Ser Phe Thr Val Asp Ser Ala Gin 
180 185 190 

Ser Ser Ser Leu Val Ala Val His Leu Ala Cys Glu Ser Leu Leu Arg 
195 200 205 

Gly Glu Ser Ala Val Ala Val Val Gly Gly Val Ser Leu He Leu Ala 
210 215 220 

Glu Glu Ser Thr Ala Ala Met Ala Arg Met Gly Ala Leu Ser Pro Asp 
225 230 235 240 

Gly Arg Cys Phe Thr Phe Asp Ala Arg Ala Asn Gly Tyr Val Arg Gly 
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245 



250 



255 



Glu Gly Gly Val Ala Met Val Leu Lys Pro Leu lie Arg Ala lie Glu 
260 265 270 

Asp Gly Asp Gin Val His Cys Val lie Arg Gly Cys Ala Val Asn Asn 
275 280 285 

Asp Gly Gly Gly Pro Ser Leu Thr His Pro. Asp Arg Glu Ala Gin Glu 
290 295 300 

Ala Leu Leu Arg Arg Ala Tyr Glu Arg Ala Gly Val Ala Pro Glu His 
305 310 315 320 

Val Asp Tyr Val Glu Leu His Gly Thr Gly Thr Lys Ala Gly Asp Pro 
325 330 335 

Val Glu Ala Ala Ala Leu Gly Ala Val Leu Gly Val Ala Arg Gly Cys 
340 345 350 

Asp Asn Pro Leu Ala Val Gly Ser Val Lys Thr Asn Val Gly His Leu 
355 360 365 

Glu Gly Ala Ala Gly lie Thr Gly Leu Leu Lys Ala Val Leu Cys Val 
370 375 380 

Arg Glu Gly Val Leu Pro Pro Ser Leu Asn Phe Arg Thr Pro Asn Pro 
385 390 395 400 

Asp lie Arg Leu Asp Glu Leu Asn Leu Arg Val Gin Thr Glu Leu Gin 
405 410 415 

Pro Trp Pro Gly Asp Gly Thr Gly Arg Pro Arg Val Ala Gly Val Ser 
420 425 430 

Ser Phe Gly Met Gly Gly Thr Asn Ala His Leu lie Leu Glu Gin Ala 
435 440 445 

Pro Val Ala Ala Glu Glu Thr Ala Val Thr Asp Ala Gly Val Gly Ser 
450 455 460 

Val Arg Val Val Pro Val Val Val Ser Gly Arg Ser Val Gly Ala Leu 
465 470 475 480 

Arg Ala Tyr Ala Gly Arg Leu Arg Glu Val Cys Ala Gly Leu Ser Asp 
485 490 495 

Gly Gly Gly Ser Gly Gly Gly Ser Gly Leu Val Asp Val Gly Trp Ser 
500 505 510 

Leu Val Ser Ser Arg Ser Val Phe Glti His Arg Ala Val Val Phe Gly 
515 520 525 

Gly Gly Val Ala Glu Val Val Ala Gly Leu Asp Ala Val Ala Ser Gly 
530 535 540 



TQa Val Ser Ser Gly Ser Val Val Val Gly Ser Val Ala Ser Gly Val 
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555 



560 



Ala Gly Gly Gly Gly Arg Val Val Phe Val Phe Pro Gly Gin Gly Trp 
565 570 575 

Gin Trp Val Gly Met Gly Ala Ala Leu Leu Asp Glu Ser Glu Val Phe 
580 585 590 

Ala Glu Ser Met Val Glu Cys Gly Arg Ala Leu Ser Gly Phe Val Asp 
595 600 605 

Trp Asp Leu Leu Glu Val Val Arg Gly Gly Gly Gly Asp Gly Ser Phe 
610 615 620 

Gly Arg Val Asp Val Val Gin Pro Val Ser Trp Ala Val Met Val Ser 
625 630 635 640 

Leu Ala Arg Leu Trp Met Ser Val Gly Val Val Pro Asp Ala Val Val 
645 650 655 

Gly His Ser Gin Gly Glu Val Ala Ala Pro Val Val Gly Gly Val Leu 
660 665 670 

Ser Val Ala Asp Gly Ala Arg Val Val Ala Leu Arg Ser Arg Val lie 
675 680 685 

Gly Glu Val Leu Ala Gly Gly Gly Ala Met Val Ser Val Gly Leu Pro 
690 695 700 

Val Ala Val Val Leu Asp Arg Leu Ala Gly Trp Gly Gly Arg Leu Gly 
705 710 715 720 

Val Ala Ala Val Asn Gly Pro Ser Leu Thr Val Val Ser Gly Asp Val 
725 730 735 

Asp Ala Ala Val Gly Phe Val Gly Glu Cys Glu Arg Asp Gly Val Trp 
740 745 750 

Val Arg Arg Val Ala Val Asp Tyr Ala Ser His Ser Ala His Val Glu 
755 760 765 

Ala Val Glu Gly Met Leu Ser Gly Leu Leu Gly Gly Leu Cys Pro Gly 
770 775 780 

Arg Gly Val Val Pro Phe Tyr Ser Ser Val Val Gly Gly Val Val Asp 
785 790 795 800 

Gly Val Gly Leu Asp Gly Gly Tyr Trp Tyr Arg Asn Leu Arg Glu Arg 
805 810 815 

Val Leu Phe Ser Asp Val Val Gly Arg Leu Val Gly Asp Gly Phe Ser 
820 825 830 

Gly Phe Val Glu Cys Ser Gly His Pro Val Leu Ala Gly Gly Val Leu 
835 840 845 



Glu Ser Val Ala Val Val Asp Pro Asp Val Arg Pro Val Val Val Gly 
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850 855 860 

Ser Leu Arg Arg Asp Asp Gly Gly Trp Gly Arg Phe Leu Thr Ser Val 
865 870 875 880 

Gly Glu Ala Phe Val Gly Gly Met Ser Val Asp Trp Lys Gly Val Phe 
885 890 895 

Ala Gly Ala Gly Ala Arg Leu Val Asp Leu Pro Thr Tyr Pro Phe Gin 

900 905 910 

Arg Arg His Tyr Trp Ala Pro Asn Thr Asp Gly Ala Pro Ala Pro lie 
915 920 925 

Leu Asp Asp His Ala Glu Ala Glu Asn Glu Pro Ala Glu Ser Glu Pro 
930 935 940 

Gly lie Arg Ala Glu Leu Leu Thr Leu Ala Glu Pro Glu Gin Leu Asn 
945 950 955 960 

Arg Leu Leu Ala Thr Val Arg Ala Ser Thr Ala Val Val Leu Gly Leu 
965 970 975 

Asp Ser Ala Gin Ala Val Asp Pro Glu Arg Thr Phe Lys Glu His Gly 
980 9B5 990 

Phe Glu Ser Val Thr Ala Val Glu Leu Cys Asn His Leu Gin Arg Gly 
995 1000 1005 

Thr Gly Leu Arg Val Pro Ala Ser Leu Val Tyr Asn His Pro Thr 
1010 1015 1020 

Pro Met Ala Ala Ala Arg Lys Leu Gin Glu Glu He Gin Gly Arg 
1025 1030 1035 

Gin Pro Glu Asn Val Arg Gin Val Thr Ser Ala Ala Ala Val Asp 
' 1040 1045 1050 

Asp Pro Val Val Val Val Gly Met Gly Cys Arg Phe Pro Gly Gly 
1055 1060 1065 

Val Val Cys Ala Glu Gly Leu Trp Asp Leu Val Leu Gly Gly Gly 
1070 1075 1080 

Asp Ala Val Ser Gly Phe Pro Val Asp Arg Gly Trp Asp Val Glu 
1085 1090 109-5 

Gly Leu Phe Asp Pro Val Arg Gly Val Val Gly Lys Ser Tyr Val 
1100 1105 1110 

Arg Glu Gly Gly Phe Val Tyr Asp Ala Gly Met Phe Asp Ala Glu 
1115 1120 1125 

Phe Phe Gly Val Ser Pro Arg Glu Ala Val Ala Met Asp Pro Gin 
1130 1135 1140 

Gin Arg Leu Phe Leu Glu Val Ser Trp Glu Ala Leu Glu Arg Ala 
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1145 1150 1155 

Gly lie Asp Pro Leu Gly Leu Arg Gly Ser Arg Thr Gly Val Tyr 
1160 1165 1170 

Val Gly Val Met Gly Gin Glu Tyr Gly Pro Arg Leu Val Glu Ser 
1175 1180 1185 

Gly Gly Gly Phe Glu Gly Tyr Leu Leu Thr Gly Thr Ser Pro Ser 
1190 1195 1200 

Val Val Ser Gly Arg Val Ser Tyr Val Leu Gly Leu Glu Gly Pro 
1205 1210 1215 

Ser lie Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Leu 
1220 1225 1230 

His Leu Ala Cys Gin Gly Leu Arg Leu Gly Glu Cys Asp Val Ala 
1235 1240 1245 

Leu Ala Gly Gly Val Thr Val lie Ala Ala Pro Gly Leu Phe Val 
1250 1255 1260 

Glu Phe Ser Arg Gin Gly Gly Leu Ser Gly Asp Gly Arg Cys Arg 
1265 1270 1275 

Ala Phe Ala Gly Gly Ala Asp Gly Thr Gly Trp Gly Glu Gly Ala 
1280 1285 1290 

Gly Val Val Val Leu Glu Arg Leu Ser Val Ala Arg Glu Arg Gly 

1295 1300 1305 

His Arg Val Leu Ala Val Val Arg Gly Ser Ala Val Asn Gin Asp 
1310 1315 1320 

Gly Gly Ser Asn Gly Leu Thr Ala Pro Ser Gly Val Ala Gin Arg 
1325 1330 1335 

Arg Val He Gly Ala Ala Leu Val Ala Ala Gly Leu Gly Val Ser 
1340 1345 1350 

Asp Val Asp Val Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly 
1355 1360 1365 

Asp Pro He Glu Ala Glu Ala Leu Leu Gly Ser Tyr Gly Arg Gly 
1370 1375 1380 

Arg Val Gly Gly Ala Leu Leu Leu Gly Ser Val Lys Ser Asn He 
1385 1390 1395 

Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Val He Lys Met 
1400 1405 1410 

Val Met Ala Leu Arg. Ala Gly Val Val Pro Ala Thr Leu His Val 
1415 1420 1425 

Asp Val Pro Ser Pro Leu Val Asp Trp Ser Ser Gly Gly Val Glu 
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1430 1435 1440 

Leu Val Thr Glu Ala Arg Asp Trp Pro Val Val Gly Arg Val Arg 
1445 1450 1455 

Arg Ala Gly Val Ser Ala Phe Gly Val Ser Gly Thr Asn Ala His 
1460 1465 1470 

Leu He Leu Glu Gin Ala Pro Glu Phe Asp Asp Pro Val Val Thr 
1475 1480 1485 

Asp Thr Asp. Thr Asp Ala Gly Val Gly Arg Gly Leu Ser Val Val 
1490 1495 1500 

Pro Val Val Val Ser Gly Arg Ser Thr Ala Ala Leu Arg Ala Tyr 
1505 1510 1515 

Ala Gly Arg Leu Arg Glu Val Cys Ala Gly Leu Ser Asp Gly Ala 
1520 1525 1530 

Gly Leu Val Asn Val Gly Trp Ser Leu Val Ser Ser Arg Ser Val 
1535 1540 1545 

Phe Glu His Arg Ala Val Val Phe Gly Gly Gly Val Ala Glu Val 
1550 1555 1560 

Val Ala Gly Leu Asp Ala Val Val Ser Gly Ala Val Ala Ser Gly 
1565 1570 1575 

Ser Val Val Val Gly Ser Val Ala Ser Gly Val Ala Gly Gly Gly 
1580 1585 1590 

Gly Arg Val Val Phe Val Phe Pro Gly Gin Gly Trp Gin Trp Val 
1595 1600 1605 

Gly Met Gly Ala Ala Leu Leu Asp Glu Ser Glu Val Phe Ala Glu 
1610 1615 1620 

Ser Met Val Glu Cys Gly Arg Ala Leu Ser Gly Phe Val Asp Trp 
1625 1630 1635 

Asp Leu Leu Glu Val Val Arg Gly Gly Ala Gly Glu Gly Val Trp ' 
1640 1645 1650 

Gly Arg Val Asp Val Val Gin Pro Val Ser Trp Ala Val Met Val 
1655 1660 1665 

Ser Leu Ala Arg Leu Trp Met Ser Val Gly Val Val Pro Asp Ala 
1670 1675 1680 

Val Val Gly His Ser Gin Gly Glu Val Ala Ala Ala Val Val Gly 
1685 1690 1695 

Gly Val Leu Ser Val Ala Asp Gly Ala Arg Val Val Ala Leu Arg 
1700 1705 1710 

Ser Arg Val He Gly Glu Val Leu Ala Gly Gly Gly Ala Met Val 
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1715 1720 1725 

Ser Val Gly Leu Pro lie Val Asp Ala Gin Glu Arg Leu Ala Gly 
1730 1735 1740 

Trp Gly Gly Arg Leu Gly Val Ala Ala Val Asn Gly Pro Ser Leu 
1745 1750 1755 

Thr Val Val Ser Gly Asp Val Asp Ala Ala Val Gly Phe Val Gly 
1760 1765 1770 

Glu Cys Glu Arg Asp Gly Val Trp Val Arg Arg Val Ala Val Asp 
1775 1780 1785 

Tyr Ala Ser His Ser Ala. His Val Glu Ala Val Glu Gly Met Leu 
1790 1795 1800 

Ser Gly Leu Leu Gly Gly Leu Cys Pro Gly Arg Gly Val Val Pro 
1805 1810 1815 

Phe Tyr Ser Ser Val Val Gly Gly Val Val Asp Gly Val Gly Leu 
1820 1825 1830 

Asp Gly Gly Tyr Tarp Tyr Arg Ash Leu Arg Glu Arg Val Leu Phe 
1835 1840 1845 

Ser Asp Val Val Gly Arg Leu Val Gly Asp Gly Phe Ser Gly Phe 
1850 ' 1855 1860 

Val Glu Cys Ser Gly His Pro Val Leu Ala Gly Gly Val Leu Glu 
1865 1870 1875 

Ser Val Ala Val Val Asp Pro Asp Val Arg Pro Val Val Val Gly 
1880 1885 1890 

Ser Leu Arg Arg Asp Asp Gly Gly Trp Gly Arg Phe Leu Thr Ser 
1895 1900 1905 

Val Gly Glu Ala Phe Val Gly Gly Met Ser Val Asp Trp Lys Gly 
1910 1915 1920 

Val Phe Ala Gly Ala Gly Ala Arg Leu Val Asp Leu Pro Thr Tyr 
1925 1930 1935 

Pro Phe Gin Arg Arg His Tyr Trp Ala Pro Thr Pro Thr Asn Pro 
1940 1945 1950 

Ala Thr Asn Pro Ala Thr Gly Asp Thr Thr Thr Ala Asp Pro Val 
1955 1960 1965 



Gly Gly Val Arg Tyr Arg lie 
1970 1975 

Asp Pro Arg Pro Leu Thr Asn 
1985 1990 

Gly Thr Ala Gly Ser Glu Leu 



Thr Trp Lys Pro Leu Pro Thr Asp 
1980 

Arg Trp Leu Leu He Ala Asp Pro 
1995 

Ala Ala Asp He Thr Ala Ala Leu 
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2000 2005 2010 

He Arg Arg Gly Ala Glu Val Glu Leu Leu Ala Val Asp Pro Leu 
2015 2020 2025 

Ala Gly Arg Ala Arg He Ala Glii Leu- Leu Ala Thr Thr Thr Ala 
2030 2035 2040 

Gly Pro Val Pro Leu Ser Gly Ala Val Ser Leu Leu Gly Leu Val 
2045 2050 2055 

Gin Asp Ala His Pro Gin His Pro Ser He Gly Met Gly Val Val 
2060 2065 2070 

Ser Ser Leu Ala Leu Val Gin Ala He Gly Asp Ala Gly Ala Glu 
2075 2080 2085 

Thr Pro Leu Trp Ser Val Thr Gin Gly Ala Val Ala Val Val Pro 
2090 2095 2100 

Gin Glu Ala Pro Asp Val Phe Gly Ala Gin Val Trp Ala Phe Gly 
2105 2110 2115 

Arg Val Ala Ala Leu Glu Leu Pro Asp Arg Trp Gly Gly Leu Val 
2120 2125 2130 

Asp Leu Pro Ser Val Pro Asn Ala Arg Met Leu Asp Gin Leu Ala 
2135 2140 2145 

Asn Ala Leu Ala Gly Ala Asp Gly Glu Asp Gin He Ala Val Arg 
2150 2155 2160 

Gly Ser Gly He Tyr Gly Arg Arg Val Thr' Arg Ala Ala Gly Thr 
2165 2170 2175 

Ala Arg Arg Glu Trp Arg Pro Arg Gly Asn He Leu Val Thr Gly 
2180 2185 2190 

Gly Thr Gly Ser Leu. Gly Gly Arg Val Ala Arg Trp Leu Ala Arg 
2195 2200 2205 

Asn Gly Ala Glu His Leu Val Leu Thr Ser Arg Arg Gly Ala Asp 
2210 2215 2220 

Ala Pro Gly Ala Ala Glu Leu Glu Ala Asp Leu Arg Ala Leu Gly 
2225 2230 2235 

Val Glu Val Thr Met Ala Ala Cys Asp Val Ala Asp Arg Ala Ala 
2240 2245 2250 

Leu Ser Asp Val Leu Ala Ala His Pro Pro Thr Ala Val Phe His 
2255 2260 2265 

Thr Ala Gly Val Leu His Asp Gly Val He Asp Thr Leu Ala Ala 
2270 2275 2280 

Gly His He Asp Glu Val Phe Arg Pro Lys Thr Ala Ala Ala Leu 
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2285 



2290 



2295 



Leu Leu Asp Glu Leu Thr Gin His Gin Glu Leu Asp ' Ala Phe Val 



Ala Tyr Ala Ala Ala Asn Ala Ser Leu Asp Ala Leu Ala Glu Arg 

2330 2335 2340 

Arg Arg Ala Ala Gly Leu Pro Ala Thr Ser lie Ala Trp Gly Leu 

2345 2350 2355 

Trp Gly Gly Gly Gly Met Ala Glu Gly He Gly Glu Gin Asn Leu 

2360 2365 2370 

Asn Arg Arg Gly He Thr Ala Leu Asp Pro Glu Leu Gly He Ala 

2375 2380 2385 

Ala Leu Gin Gin Ala Leu Asp Arg Asp Asp Val Ser Val Thr Val 

2390 2395 2400 

Ala Asp Val Asp Trp Thr Val Phe Ala Pro Arg Leu Ala Asp Leu 

2405 2410 2415 

Arg Ser Gly Arg Leu Phe Asp Gly Val Pro Glu Ala Arg Ser Ala 

2420 2425 2430 

Leu Asp Ala Arg Lys Val Asp Thr Glu Ser Pro Ser Ala Gly Leu 

2435 2440 2445 

Ala Gin Arg Val Ala Gly Met Pro Asp Ala Glu Arg Gin Arg Val 

2450 2455 2460 

Leu Leu Glu Thr Val Arg Ala Ala Ala Ala Ala Val Leu Arg His 

2465 2470 2475 

Glu Thr Val Asp Ala Val Ala Pro Thr Arg Ala Phe Lys Asp Ala 

2480 2485 2490 

Gly Phe Asp Ser Leu Thr Ala Leu Glu Leu Arg Asn His Leu Asn 

2495 2500 2505 

Ser Thr Thr Gly Leu Ser Leu Pro Pro Thr Val Val Phe Asp His 

2510 2515 2520 

Pro Thr Pro Ser Thr Leu Ala Lys Phe Leu Glu Gly Val Leu Val 

2525 2530 2535 

Gly Ala Ser Ala Glu Glu Val Pro Val Thr Ala Ala Ala Val Pro 

2540 2545 2550 

Val Asp Glu Pro He Ala He Val Gly Met Ala Cys Arg Tyr Pro 

2555 2560 2565 

Gly Gly Ala Asp Thr Pro Glu Lys Leu Trp Asp Leu Leu Leu Ala 



2300 



2305 



2310 



Leu Phe Ser Ser Val Thr Gly Val 
2315 , 2320 



Trp Gly Asn Gly Gly Gin Ala 
2325 
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2570 2575 2580 

Gly Ala Asp Val He Gly Pro Ala Pro Asp Asp Arg Gly Trp Asp 
2585 2590 2595 

Val Asp Ser Phe Phe Asp Pro Val Pro Gly Ala Ala Gly Lys Sex 
2600 2605 2610 

Tyr Ala Arg Glu Gly Gly Phe Val Tyr Asp Ala Gly Met Phe Asp 
2615 2620 2625 

Ala Glu Phe Phe Gly Val Ser Pro Arg Glu Ala Val Ala Met Asp 
2630 • 2635 2640 

Pro Gin Gin Arg Leu Leu Leu Glu Thr Ser Trp Glu Ala Leu Glu 
2645 2650 2655 

Arg Ala Gly He Asp Pro Ala Gly Leu Arg Gly Ser Arg Thr Gly 
2660 2665 2670 

Val Tyr 'Ser Gly Leu Thr His Gin Glu Tyr Ala Ala Arg Leu His 
2675 2680 2685 

Glu Ala Pro- Gin Glu Leu Glu Gly Tyr Leu Leu Thr Gly Lys Ser 
2690 2695 2700 

Val Ser Val Ala Ser Gly Arg Val Ser Tyr Val Leu Gly Leu Glu 
2705 2710 2715 

Gly Pro Ser He Ser Val Asp Thr Ala Cys Ser Ser Ser Leu Val 
2720 2725 2730 

Ala Leu His Leu Ala Cys Gin Gly Leu Arg Leu Gly Glu Cys Asp 
2735 2740 2745 

Val Ala Leu Ala Gly Gly Val Thr Val He Ala Ala Pro Gly Leu 
2750 2755 2760 

Phe Val Glu Phe Ser Arg Gin Gly Gly Leu Ser Gly Asp Gly Arg 
2765 2770 - 2775 

Cys Arg Ala Phe Ala Gly Gly Ala Asp Gly Thr Gly Trp Gly Glu 
2780 2785 '2790 

Gly Ala Gly Val Val Val Leu Glu Arg Leu Ser Val Ala Arg Glu 
2795 2800 2805 

Arg Gly His Arg Val Leu Ala Val Val Arg Gly Ser Ala Val Asn 
2810 2815 2820 

Gin Asp Gly Gly Ser Asn Gly Leu Thr Ala Pro Ser Gly Val Ala 
2825 2830 2835 

Gin Arg Arg Val lie Gly Ala Ala Leu Val Ala Ala Gly Leu Gly 
2840 2845 2850 

Val Ser Asp Val Asp Val Val Glu Ala His Gly Thr Gly Thr Arg 
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2855 2860 2865 

Leu Gly Asp Pro He Glu Ala Glu Ala Leu Leu Gly Ser Tyr Gly 
2870 2875 2880 

Arg Gly Arg Val Gly Gly Ala Leu Leu Leu Gly Ser Val Lys Ser 
2885 2890 2895 

Asn He Gly His Thr Gin Ala Ala Ala Gly Val Ala Gly Val He 
2900 2905 2910 

Lys Met Val Met Ala Leu Arg Ala Gly Val Val Pro Ala Thr Leu 
2915 2920 2925 

His Val Asp Val Pro Ser Pro Leu Val Asp Trp Ser Ser Gly Gly 
2930 2935 2940 

Val Glu Leu Val Thr Glu Ala Arg Asp Trp Pro Val Val Gly Arg 
2945 2950 2955 

Val Arg Arg Ala Gly Val Ser Ala Phe Gly Val Ser Gly Thr Asn 
2960 2965 2970 

Ala His Leu He Leu Glu Gin Ala Pro Glu Phe Asp Asp Pro Ala 
2975 2980 2985 

Asp Ser Asp Ser Asp Ser Asp Ser Asp Ser Asp Ala Gly Val Val 
2990 2995 3000 

. Asp Gly Gly Glu Gly Gly Val Gly Arg Ser Leu Ser Val Val Pro 
3005 3010 3015 

Val Val Val Ser Gly Arg Ser Val Gly Ala Leu Arg Ala Tyr Ala 
3020 3025 3030 

Gly Arg Leu Arg Glu Val Cys Ala Gly Leu Ser Asp Gly Gly Gly 
3035 3040 3045 

Ser Gly Gly Gly Ser Gly Leu Val Asp Val Gly Trp Ser Leu Val 
3050 3055 3060 

Ser Ser Arg Ser Val Phe Glu His Arg Ala Val Val Phe Gly Gly 
3065 3070 3075 

Gly Val Glu Glu Val Val Ala Gly Leu Gly Ala Val Ala Ser Gly 
3080 3085 3090 

Ala Val Ala Ser Gly Ser Val Val Val Gly Ser Val Ala Ser Gly 
3095 3100 3105 

Val Ala Gly Gly Gly Gly Arg Val Val Phe Val Phe Pro Gly Gin 
3110 3115 3120 

Gly Trp Gin Trp Val Gly Met Gly Ala Ala Leu Leu Asp Glu Ser 
3125 3130 3135 

Glu Val Phe Ala Glu Ser Met Val Glu Cys Gly Arg Ala Leu Ser 
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3140 . 3145 



3150 



Gly Phe Val Asp Trp Asp Leu Leu Glu Val Val Arg Gly Gly Ala 
3155 3160 3165 

Gly Glu Gly Val Trp Gly Arg Val Asp Val Val Gin Pro Val Ser 
3170 3175 2180 

Trp Ala Val Met Val Ser Leu Ala Arg Leu Trp Met Ser Val Gly 
3185 3190 3195 

Val Val Pro Asp Ala Val Val Gly His Ser Gin Gly Glu Val Ala 
3200 3205 3210 

Ala Ala Val Val Gly Gly Val Leu Ser Val Ala Asp Gly Ala Arg 
3215 3220 3225 

Val Val Ala Leu Arg Ser Arg Val He Gly Glu Val Leu Ala Gly 
3230 . 3235 324O 

Gly Gly Ala Met Val Ser Val Gly Leu Pro He Val Asp Val Gin 
3245 3250 3255 

Glu Arg Leu Ala Gly Trp Gly Gly Arg Leu Gly Val Ala Ala Val 
3260 3265 3270 

Asn Gly Pro Ser Leu Thr Val Val Ser Gly Asp Val Asp Ala Ala 
3275 3280 3285 

Val Gly Phe Val Gly Glu Cys Glu Arg Asp Gly Val Trp Val Arg 
3290 3295 3300 

Arg Val Ala Val Asp Tyr Ala Ser His Ser Ala His Val Glu Ala 
3305 3310 3315 

Val Glu Gly Met Leu Ser Gly Leu Leu Gly Gly Leu Cys Pro Glv 
3320 3325 3330 

Arg Gly Val Val Pro Phe Tyr Ser Ser Val Val Gly Gly Val Val 
3335 3340 3345 

Asp Gly Val Gly Leu Asp Gly Gly Tyr Trp Tyr Arg Aan Leu Arg 
3350 3355 335Q 

Glu Arg val Leu Phe Ser Asp Val Val Gly Arg Leu Val Gly Asp 
3365 3370 3375 

Gly Phe Ser Gly Phe Val Glu Cys Ser Gly His Pro Val Leu Ala 
3380 3385 3390 

Gly Gly Val Leu Glu Ser Val Ala Val Val Asp Pro Asp Val Ara 
3395 3400 3405 

Pro Val Val Val Gly Ser Leu Arg Arg Asp Asp Gly Gly Trp Gly 
3410 3415 3420 

Arg Phe Leu Thr Ser Val Gly Glu Ala Phe Val Gly Gly Met Ser 
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3425 3430 3435 

Val Asp Trp Lys Gly Val Phe Ala Gly Ala Gly Ala Arg Leu Val 
3440 3445 3450 

Asp Leu Pro Thr Tyr Pro Phe Gin Arg Arg His Tyr Trp Ala Gin 
3455 3460 \ 3465 

Thr Ser Pro Ala Gly Val Gly Thr Ala Ala Ala Ala Arg Phe Gly 
3470 3475 3480 

Met Glu Trp Glu Asp His Pro Leu Leu Gly Gly Ala Leu Ser Val 
•3485 3490 3495 

Gly Gly Ser Arg Ser Leu Leu Leu Ala Gly His Leu Ser Leu Ala 
3500 3505 3510 . 

Ser His Ala Trp Leu Thr Asp His Ala Val Ser Gly Thr Val Leu 
3515 3520 3525 

Leu Pro Gly Tlir Ala Phe Val Glu Leu Ala Leu His Ala Ala Ala 
3530 3535 3540 

Ala Ala Gly Cys Pro Glu Val Glu Glu Leu Arg Leu Glu Ala Pro 
3545 3550 3555, 

Leu Val Val Pro Ala Arg Gly Gly Val Arg Leu Gin Val Leu Val 
3560 3565 3570 

Asp Asp Pro Asp Asp Gly Ser Asp Arg Arg Ala Val Ser Val Phe 

3575 3580 3585 

Ser Arg Asp Asp Ala Ala Pro Ala Glu Ser Ala Trp Thr Arg His 
3590 3595 3600 

Ala Val Gly Val Leu Ala Ala Arg Ser Arg Pro Ala Pro Ala Ala 
3605 3610 3615 

Pro Trp His Thr Asp Ala Trp Pro Pro Ser Gly Thr Glu Pro Val 
3620 3625 3630 

Asp Val Ala Asp Leu Tyr Glu Arg Phe Ala Ala Leu Gly Tyr Glu 
3635 3640 3645 

Tyr Gly Glu Ala Phe Ala Gly Leu Gin Gly Val Trp Arg Gly Asp 
3650 3655 3660 

Gly Glu Val Phe Ala Glu Val Arg Leu Pro Asp Arg Val Ser Ala 
3665 3670 3675 

Glu Ala lie Arg Phe Gly Leu His Pro Ala Leu Leu Asp Ala Ala 
3680 3685 3690 

Leu Gin Gly Trp Leu Ala Gly Asp Leu Val Gly Val Pro Glu Gly 
3695 3700 3705 

Ser Val Leu Leu Pro Phe Ala Trp Gin Gly Val Val Leu His Ala 
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3710 3715 3720 

Thr Gly Ala Asp Thr Leu Arg Val Arg lie Gly Arg Ser Gly Asp 
3725 3730 3735 

Ser Ala Val Cys Leu His Ala Val Asp Pro Ala Gly Ala Pro Val 
3740 3745 3750 

Leu Ser Leu Asp Ala Leu Ala Leu Arg Pro Leu Val Arg Glu Arg 
3755 3760 3765 

Leu Gly Leu Pro Ala Asp Ala Gly Ala Gly Ala Leu Tyr Arg Val 
3770 3775 378O 

Gly Trp Arg Arg Gin Ala Ala Val Ala Gly Ala Ala Asp Arg Arg 
3785 3790 3795 

Trp Ala Val Val Ala Pro Asn Gly Ala Glu Ala Asp Gly Ala Ala 
3800 3805 3810 

Glu Pro His Arg Trp Pro Val Ala Ala Val Asp Val His Thr Asp 
3815 3820 3825 

Val Asp Ser Leu Arg Ala Ala Leu Asp Ala Gly Ala Glu Leu Pro 
3830 3835 3840 

Ala Val Val Leu Ala Asp Phe Arg Arg Ala Ala Gly Trp Ser Val 
3845 3850 3855 

Asp Ser Ser Leu Ala Ala Gly Pro Ser Pro Asn Asp Gly Ala Val 
3860 3865 3870 

. Gly Asp Gly Ala Val Gly Asp Ala Arg Ala Gly Ala Val Arg Ala 
3875 3880 3885 

Ala Thr Arg Ala Gly Leu Asp Leu Leu Gin Arg Trp Leu Ala Asp 
3890 3895 3900 

Glu Arg Phe lie Ala Ala Arg Leu Val Val Val Thr Glu Arg Ala 
3905 3910 3915 

Val Ala Ala Gly Pro Asp Glu Asp Val Pro Gly Leu Val His Ala 
3920 3925 3930 

Gly Leu Trp Gly Leu Leu Arg Ser Ala Gin Ser Glu His Pro Asp 
3935 3940 3945 

Arg Phe Val Leu Val Asp Val Asp Ala Asp Asp Ser Ser Leu Ala 
3550 3955 3960 

Ala Leu Pro Ser Ala Leu Ala Met Asp Ala Pro Gin Leu Val Val 
3965 3970 3975 

Arg Ala Gly Gin He Leu Leu Pro Glu He Glu Pro Val Arg Pro 
3980 3985 3990 

val Pro Glu Pro Glu Gin Ala Glu Pro Glu Pro Gly Ala Val Leu 
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3995 4000 4005 

Asp Pro Asp Gly Thr Val Leu Leu Thr Gly Ala Thr Gly Thr Leu 
4010 4015 4020 

Gly Gly Leu Leu Ala Arg His Leu Val Thr Thr Arg Gly Ala Arg 
4025 4030 4035 

Arg Leu Leu Leu Val Ser Arg Ser Gly Pro Asp Ala Pro Asp Ala 
4040 4045 4050 

Gly Arg Leu Thr Glu Glu Leu Thr Gly Leu Gly Ala His Val Thr 
4055 4060 4065 

Leu Ala Ala Cys Asp Thr Thr Asp Arg Ala Ala Leu Ala Gly Val 
4070 4075 4080 

Leu Gly Gly He Pro Ala Glu His Pro Leu Thr Ala Val Val His 
4085 4090 4095 

Val Ala Gly Val Leu Asp Asp Gly Ala Val Gin Ala Leu Thr Pro 
4100 4105 4110 

Glu Arg Val Asp Ala Val Leu Arg Pro Lys Val Asp Ala Ala Leu 
4115 4120 4125 

His Leu His Glu Leu Thr Ala Gly Leu Pro Leu Ala Ala Phe Val 
4130 4135 4140 

Leu Phe Ser Gly Ala Ala Gly He Leu Gly Arg Pro Gly Gin Ala 
4145 4150 4155 

Asn Tyr Ala Ala Ala Asn Thr Phe Leu Asp Ala Leu Ala Gin His 
4160 4165 4170 

Arg Arg Ala Arg Gly Leu Pro Gly Val Ser Leu Ala Trp Gly Leu 
4175 4180 4185 

Trp Gly Leu Ala Ser Asp Met Thr Gly His Leu Gly Glu Gin Asp 
4190 4195 4200 

Leu Arg Arg Met Arg Arg Ser Gly He Ala Pro Met Thr Gly Glu 
4205 4210 4215 

Glu Gly Levi Ala Leu Phe Asp Leu Ala Leu Asp Leu Ala Arg Asp 
4220 4225 4230 

Glu Pro Val Leu Val Pro Ala Arg Leu Asp Pro Ala Ala Leu Arg 
4235 4240 4245 

Arg Glu Trp Ala Ala Asn Gly Pro Gly Ala Val Pro Val Leu Leu 
4250 4255 4260 

Arg Gly Leu Val Pro Ala Ala Pro Leu Arg Arg Ala Ala Pro Ser 
4265 4270 4275 

Gly Ala Ala Gly Gly Ala Pro Val Pro Ala Val Ala Ala Pro Gin 
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4280 4285 4290 

Gin Ala Asp Glu Leu Arg. Gly Gin Leu Ala Gly Lys Asp Ala Gin 
• 429.5 ■ 430p ■ ■ 4305 

Ala Gin Val Arg Gin Leu Leu Asp Leu Val Arg Ala His Val Ala 
4310 4315 ' 4320 ' 

Gly Val Leu Ala Leu Arg Glu Ala Ala Asp Val Asp Pro Gly Arg 
4325 4330 4335 

Pro Phe Arg Glu Val Gly Phe Asp Ser Leu Thr Ala Val Glu Leu 
4340 4345 4350 

Arg Asn Arg Leu Gly Ser Ala Thr Gly Leu Arg Leu Ala Pro Ser 
4355 4360 . 4365 

Leu Val Phe Asp His Pro Thr Pro Ser Ala Val Ala Glu His Leu - 
4370 4375 4380 

Val Asp Afg Leu Ala Ala .Glu Gly Ala Ala Asp Glu Gly Ala Ala 
4385 4390 4395 

Ala Leu Thr Gly Leu Asp Ala Val Ala Ala Ala Leu Gly Gly Met 
4400 4405 4410 

Arg Thr Asp Asp Val Arg Arg Asp. lie Val Arg Arg Arg Leu Glu 
4415 4420 . 4425 

Glu Met Leu Ala Leu Val Gly Gly Pro Arg Ser Gly Pro Ala -Gly 
4430 4435 4440 

Asp Gly Leu Val Asp Ala Thr Val Ala Glu Arg Leu Asp Ser Ala 
4445 4450 4455 

Ser Asp Asp Glu Leu Phe Ala Leu lie Glu Glu Gin Leu 
4460 4465 4470 

<210> 11 
<211> 13416 
<212> DNA. 

<213> micromonospora carbonacea subspecies aurantiaca ■ 
<400> 11 • 

atgcgagttg tgggcgcaga cgcgtgcagc gcagccgtcc ccgccggacc gcggatgggc 60 
ttcccagcat cgttcttcga cccaggagac ctcatgacdg tgcagagtga cgtgttgcgc 120 
caccgcgata tcgccgtcat cgggatgtcc tgccggcttc ccggcgcgcc gagcatcgag I8D 
gaattctggg acctgctgtg cagcgggcgg agcgcggtcg accgccagcc cgacggcggt 240 
tggcgggcgg tgatcgatgg gaagggagaa tccgacgccg cgttcttcgg catgtccccg 300 
cgccaggccg ccgcggtcga cccgcaacag cgcctgatgc tcgaactcgg ctgggaggca 360. 
ctggagaacg cccgcatccg gcccgccgac ctgaagggct ccgacactgg cgtcttcgtg 420 
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gggctcaccg ccgacgacta cgccaccttg ctgcgccgct -ccggcacgcc catcagcggg . 480 
cacaccgcga caggcctgaa ccgtagcctc acggccaacc gtctctcgta cctgctgggt 540 
■ctgcgcggcc ccagcttcac cgtggactcc gcgcagtcgt catccctggt cgccgttcac 600 
ctggcgtgcg aaagcctgct gcggggcgag agcgcggtcg ccgtcgtcgg cggggtgagc • 660 

ctcatcbtgg. cagaLggagag caccgccgcc atggcgcgta tgggggcact ctctcctgac .720 
gggcgttgct tcaccttcga cgcccgggcc aacggctacg tccgtggcga gggtggcgtg 780 
gccatggtcc" tcaagccgct gatccgcgcg atcgaggacg gcgaccaggt gcactgcgtc 840 
atccggggct gtgccgtcaa caacgacggc ggtggcccca gcctcaccca tcccgaccgg 900 
gaggcccagg aggcattgct gcgccgggcg tacgagcggg cgggggtggc ccccgaacac 960 

gtcgactacg tcgagctgca cggcaccggg acgaaggccg gcgaccccgt cgaggcggcg 1020 

gccctcgggg cggtgctggg tgtcgcccgc ggctgcgaca acccactcgc ggtcggatcg 1080 

gtcaagacca acgtcggcca cctggagggg gcggccggca tcacgggcct gctgaaggcg 1140 

gtgctgtgcg tacgtgaggg ggtgctgccg ccgagcctca acttccgtac gccgaacccg 1200 

gacatccgcc tcgacgagct gaacctccgg gttcagacgg aactgcagcc gtggccgggc* 1260 

gacgggacgg gccgcccgcg tgtcgccgga gtgagttcct tcggcatggg cggtacgaat 1320 

gcgcatctga ttctcgagca ggctccggtg gcggctgagg aaacggctgt taccgatgcc * 1380 

ggtgtcggtt cggttcgggt ggttccggtg gtggtgtcgg gtcgttcggt gggggctttg 1440 

cgggcgtatg cgggtcggtt gcgtgaggtg tgcgcggggt tgtctgacgg tggtggctcc 1500' 

ggtggtggtt ctggtctggt ggatgtgggt tggtcgttgg tgtcgtcgcg gtcggt^ttc 1560 

gagcatcggg cggtcgtgtt cggtgggggt gtcgccgagg tggtggcggg tttggatgcg 1620 . 

gtggcttctg gggcggtgag ttcgggttcg gtggtggtgg gttcggtggc gtcgggtgtt 1680 

gctggtggtg gtggtcgggt ggtgtttgtg tttccgggtc agggttggca gtgggtgggt 1740 

atgggtgcgg ctctgttgga cgagtcggag gtgtttgctg agtcgatggt ggagtgtggg 1800 

cgggcgttgt cggggtttgt ggattgggat ttgttggaag tggtccgcgg tggtgggggt . 1860 

gacggatcgt ttggtcgggt tgatgtggtg cagccggtgt cgtgggcggt gatggtgtcg 1920 

ttggcgcggt tgtggatgtc ^gtgggtgtg gtgccggatg cggtggtggg tcattcgcag 1980 

ggtgaggttg ctgcgccggt ggtggggggt gtgttgagtg tggctgatgg ggcgcgggtg 2040 

gtggcgttgc ggtcgcgggt gatcggtgag gtgttggcgg gtggtggtgc gatggtgtcg 2100 
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gtggggttgc cggtggcggt tgtgttggat cggttggcgg ggtggggtgg tcggttgggt 2160 

gtggcggcgg tgaatggtcc gtcgttgacg gtggtgtcgg gggatgtgga tgctgctgtg 2220 

gggtttgttg gtgagtgtga gcgggatggg gtgtgggtgc ggcgggtggc ggtggattat 2280 

gcgtcgcatt cggcgcatgt ggaggcggtg gaggggatgc tgtcggggtt gttgggtggt 2340 

ttgtgtccgg ggcggggtgt ggtgccgttt tattcgtcgg tggtgggtgg- tgtggttgat 2400 

ggggtgggtt tggatggtgg gtattggtat cggaatctgc gtgagcgggt gttgttttcg 2460 

gatgtggtgg ggcggcttgt tggggatggg ttttcggggt ttgtggagtg ttcggggcat 2520 

ccggtgttgg cgggtggggt gttggagtcg gtggcggtgg tggatccgga tgtgcggccg 2580 
gtggtggtgg ggtcgctgcg ccgtgatgat ggtgggtggg gccggttttt gacgtcggtg . 2640 

ggtgaggcgt tcgtcggcgg gatgagtgtt gactggaagg gtgtgttcgc gggggcgggc 2700 

gcgcggttgg ttgacctgcc gacgtatccg ttccaacgac gccactactg ggcaccgaac 2760 

accgacggcg cgccagctcc gatcctcgat gatcacgcgg aggcggagaa cgaaccagcc 2820 

gaatccgagc cagggattcg ggccgagctt ctgacgttgg ccgagcccga gcaactgaac 2880 

cgactcttgg cgaccgttcg cgccagcacc gccgtcgttc tgggcctcga ctcggcgcag 2940 

gcggtcgato oggagcgcac gttcaaggag oatggattcg aatcggtcac cgccgtcgag 3000 

ctctgtaacc acctgcaacg cggcactggg ctgcgggttc ccgcctcgct tgtatacaac 306O 

catcccaccc cgatggccgc tgcccggaag ctgcaggaag aaattcaggg ccggcaaccg 3120 

gagaacgtcc ggcaggtcac ctccgctgct gctgtggatg atccggtggt ggtggtgggg 3180 

atgggttgtc gttttccggg tggggtggtg tgtgcggagg gtttgtggga tttggtgttg 3240 

gggggtgggg atgcggtgtc ggggtttccg gtggatcggg gttgggatgt ggaggggttg 3300 

tttgatccgg tgcggggtgt ggtggggaag tcgtatgtgc gggagggggg gtttgtgtat • 3360 

gacgcgggga tgttcgatgc ggagtttttt ggtgtgtcgc cgcgtgaggc ggtggcgatg 3420 

gatccgcagc agcgtttgtt tttggaggtg tcgtgggagg cgttggagcg tgcggggatt 3480 

gatccgttgg gtttgcgggg ttcgcggacg ggtgtgtatg tgggggtgat gggtoaggag 3540 

tatgggccgc ggttggtggagtcgggtggt gggtttgagg gttatttgtt gacg^ggacg 3600 

tcgccgagtg tggtgtcggg tcgtgtttcg tatgtgttgg ggttggaggg tccgtcgatt 3660 

tcggttgata cggcgtgttc gtcgtcgttg gtggcgttgc atttggcgtg tcaggggttg 3720 

cggttgggtg agtgtgatgt ggcgttggcg ggtggggtga cggtgatttgc ggcgccgggg 3780 

ttgtttgtgg agttttctcg gcagggtggg ttgtcgggtg atgggcggtg togggcgttt 3840 



- 63 - 



wo 03/010193 



PCT/CA02/01177 



gcgggtggtg cggatgggac ggggtggggg gagggtgcgg gggtggtggt gttggagcgg . 3900 

ttgtcggtgg cgcgggagcg tggtcatcgg gtgttggcgg tggtgcgggg ttctgcggtg 3960 

aatcaggatg gtgggtcgaa tggtttgacg gcgccgtcgg gggtggcgca gcgtcgggtg 4020 

attggtgcgg cgttggtggc ggcgggtttg ggtgtgtcgg atgtggatgt ggtggaggcg 4080 

catgggacgg ggactcggtt gggtgatccg attgaggctg aggcgttgtt ggggtcgtat 4140 

gggcggggtc gtgtgggtgg ggcgttgttg ttgggttcgg tgaagtcgaa tattggtcat 4200 

acgcaggcgg ctgcgggtgt ggcgggtgtg atcaagatgg tgatggcgtt gcgggcgggg 4260 

gtggtgccgg cgacgttgca tgtggatgtg ccgtcgccgt tggtggattg gtcttcgggt 4320 

ggggtggagt tggtgacgga ggcgcgggat tggccggtgg tgggtcgtgt gcgtcgtgcg 4380 

ggtgtgtcgg cgtttggggt gtcggggacg aatgcgcatc tgattttgga gcaggccccc 4440 

gaattcgacg atccggttgt taccgacacc gacaccgatg ctggtgtggg taggggtcta 4500 

tcggtggttc cggtggtggt ttcgggtcgt tcgacggcgg ctttgcgcgc ttatgcgggc 4560 

cggttgcgtg aggtgtgcgc gggtctttcc gatggtgccg gtctggtgaa tgtgggttgg • 4620 

tcgttggtgt cgtcgcggtc ggtgttcgag catcgggcgg tcgtgtttgg tgggggtgtc 4680 

gccgaggtgg tggcgggttt ggatgcggtg gtttccgggg cggtggcttc gggttcggtg 4740 

gtggtgggtt cggtggcgtc gggtgttgct ggtggtggtg gtcgggtggt gtttgtgttt 4800 

ccgggtcagg gttggcagtg ggtgggtatg ggtgcggcgc tgctggacga gtcggaggtg 4860 

tttgctgagt cgatggtgga gtgtggtcgg gcgttgtcgg ggtttgtgga ttgggatttg 4920 

ttggaggtgg tgcggggtgg ggcgggtgag ggggtgtggg gtcgggttga tgtggtgcag 4980 

ccggtgtcgt gggcggtgat ggtgtcgttg gcgcggttgt ggatgtcggt gggtgtggtg 5040 

ccggatgcgg tggtgggtca ttcgcagggt gaggttgctg cggcggtggt ggggggtgtg 5100 

ttgagtgtgg ctgatggggc gcgggtggtg gcgttgcggt cgcgggtaat tggtgaggtg 5160 

ttggccggtg gtggtgcgat ggtgtcggtc ggactgccga tcgtggatgc gcaggaacgg 5220 

ttggcggggt ggggtggtcg gttgggtgtg gcggcggtga atggtccgtc gttgacggtg 5280 

gtgtcggggg atgtggatgc tgctgtgggg tttgttggtg agtgtgagcg ggatggggtg 5340 

tgggtgcggc gggtggcggt ggattatgcg tcgcattcgg cgcatgtgga ggcggtggag 5400 

gggatgctgt cggggttgtt gggtggtttg tgtccggggc ggggtgtggt gccgttttat 5460 

tcgtcggtgg tgggtggtgt ggttgatggg gtgggtttgg atggtgggta ttggtatcgg 5520 
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aatctgcgtg agcgggtgtt gttttcggat gtggtggggc ggcttgttgg ggatgggttt 5580 

tcggggtttg tggagtgttc ggggcatccg gtgttggcgg gtggggtgtt ggagtcggtg 5640 

gcggtggtgg atccggatgt gcggccggtg gtggtggggt cgctgcgccg tgatgatggt 5700 

gggtggggcc ggtttttgac gtcggtgggt gaggcgttcg tcggcgggat gagtgttgac 5760 

tggaagggtg tgttcgcggg ggcgggcgcg cggttggttg acctgccgac gtatccgttc 5820 

caacgccgcc actactgggc accgactccc accaaccccg ccaccaaccc cgccacgggc . 5880 

gacaccacca ccgccgaccc ggtgggtggc gtgcggtatc ggatcacctg gaaaccgttg 5940 

ccgacggacg acccccgacc cctcaccaac cgctggctac tcatcgccga cccggggacc 6000 

gccggctcgg agcttgccgc agacatcaca gcagcgctca ttcgcagggg cgccgaggtc 6060 

gagttgctgg ccgtggaccc gctcgcgggc cgggcccgga tcgccgaact gctcgccacc 6120 

acgacggctg ggccggtgcc gctgtcgggc gccgtgtctc ttctcgggct tgtgcaggac 6180 

gcgcatcctc aacacccctc catcggaatg ggcgtggtct cgtcgctggc gctggtgcag 6240 

gccatcggtg acgcgggagc cgagactcct ttgfcggagcg tcacgcaggg ggcggtcgct 6300 

gtggtgcccc aggaggcgcc ggatgtgttc ggtgcgcagg tgtgggcgtt cgggcgggtg 6360 

gccgccctgg aactgccgga ccgctggggc ggcctggtcg accttccgtc cgtaccgaat 6420 

gcccggatgc tggaccagct cgccaacgcc ctcgccggag cggacggcga ggaccagatc 6480 

gcggtacgcg gctcggggat ctacgggcgt cgggtgacgc gcgcggcggg cactgcgcgc 6540 

cgggaatggc gccctcgcgg gaacatcctg gtgaccggag gtacgggaag tctgggtggc ■ 6600 

cgggtggccc ggtggctcgc tcgcaacggt gccgaacacc tcgttctcac cagtcgtcgg 6660 

ggtgccgacg ccccgggggc ggcagaactg gaagctgatc ttcgcgcgct cggtgtcgag 6720 

gtgaccatgg ccgcctgcga tgtagcggac cgggctgcgc tgtccgacgt cctggcggcg 6780 

catccgccca ctgcggtctt ccacaccgcc ggagtcctgc acgacggtgt gatcgacacg 6840 

ctcgccgccg gacacatcga cgaggtcttc cgtccgaaga ccgctgccgc gctgctgctc 6900 

gacgaactca cccagcacca ggagctggac gccttcgtcc tcttctcatc ggttaccgga 6960 

gtctggggca acggcggcca ggcggcgtac gcggcggcga acgcatcgct ggacgccctg 7020 

gcggagcgac gtcgtgccgc aggtcttccc gccacctcca tagcttgggg actgtggggc 7080 

ggcggtggca tggcggaggg gatcggcgag cagaacctga accgccgtgg catcacggcc 7140 

ttggacccgg agctcggcat cgccgctctg cagcaggccc tcgaccgcga tgacgtgtct 7200 

gtcaccgtcg ccgacgtcga ctggacggtt ttcgctccgc gtcttgccga cctgcgctcg 7260 



- 65 - 



wo 03/010193 



PCT/CA02/01177 



gggcggctct tcgacggggt gcccgaggcc aggagcgcgc tcgatgcccg gaaagtggac 7320 

accgagtcgc cgagcgccgg ccttgcgcag cgggtggcgg ggatgcccga cgcggaacgg 7380 

cagcgggtcc tcctcgaaac ggtgcgggcg gcggccgcgg cggtcctgag gcacgagacg 7440 

gtggatgcgg tcgcgcccac ccgggccttc aaggacgccg gcttcgactc gctcacggcg 7500 

ctcgaactgc gcaaccacct caacagcacg accggtctga gtctgcctcc gacggtggtc 7560 

ttcgaccacc ccaccccgtc cacgttggcg aagttcctgg agggcgtcct cgtcggcgct 7620 

tctgccgagg aagtcccggt gactgccgca gccgtgcccg tcgacgagcc tattgccatc 7680 

gtcggcatgg cctgccgcta ccccggcgga gccgacactc ccgagaagct ctgggacctc 7740 

ctgctggccg gtgctgacgt catcggccca gcccccgacg accggggctg ggacgtggac 7800 

tccttctttg atcccgtgcc gggcgccgcg gggaagtcgt atgcgcggga gggggggttt • 7860 

gtgtatgacg cggggatgtt cgatgcggag ttctttggtg tgtcgccgcg tgaggcggtg 7920 

gcgatggatc cgcagcagcg cttgttgttg gagacgtcgt gggaggcgtt ggagcgtgcg 7980 

ggaatcgatc cggcgggtct gcggggtagc cggaccggcg tgtactccgg cctgacccac 8040 

caggagtatg ccgcccgtct gcacgaggct ccgcaggaac tcgagggcta tctgctcacc 8100 

ggcaagtcgg tgagcgtcgc gtcgggtcgt gtttcgtatg tgttggggtt ggagggtccg 8160 

tcgatttcgg ttgatacggc gtgttcgtcg tcgttggtgg cgttgcattt ggcgtgtcag 8220 

gggttgcggt tgggtgagtg tgatgtggcg ttggcgggtg gggtgacggt gattgcggcg 8280 

ccggggttgt ttgtggagtt ttctcggcag ggtgggttgt cgggtgatgg gcggtgtcgg 8340 

gcgtttgcgg gtggtgcgga tgggacgggg tggggggagg gtgcgggggt ggtggtgttg 8400 

gagcggttgt cggtggcgcg ggagcgtggt catcgggtgt tggcggtggt gcggggttct 8460 

gcggtgaatc aggatggtgg gtcgaatggt ttgacggcgc cgtcgggggt ggcgcagcgt 8520 

cgggtgattg gtgcggcgtt ggtggcggcg ggtttgggtg tgtcggatgt ggatgtggtg ' 8580 

gaggcgcatg ggacggggac tcggttgggt gatccgattg aggctgaggc gttgttgggg 8640 

tcgtatgggc ggggtcgtgt gggtggggcg ttgttgttgg gttcggtgaa gtcgaatatt 8700 

ggtcatacgc aggcggctgc gggtgtggcg ggtgtgatca agatggtgat ggcgttgcgg 8760 

gcgggggtgg tgccggcgac gttgcatgtg gatgtgccgt cgccgttggt ggattggtct 8820 

tcgggtgggg tggagttggt gacggaggcg cgggattggc cggtggtggg tcgtgtgcgt 8880 

cgtgcgggtg tgtcggcgtt tggggtgtcg gggacgaatg cgcatctgat tttggagcag 8940 
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gcccccgagt tcgacgatcc tgccgattcc gattccgatt ccgattccga ttccgatgcc 9000 
ggtgtcgtgg atggcggcga gggtggtgtt ggcaggagct tgtcggtggt tccggtggtg 9060 
gtgtcgggtc gttcggtggg ggctttgcgg gcgtatgcgg gtcggttgcg tgaggtgtgc • 9120 
gcggggttgt ctgacggtgg tggctccggt ggtggttctg gtttggtgga tgtgggttgg 9180 
tcgttggtgt cgtcgcggtc ggtgtttgag catcgggcgg tcgtgttcgg tgggggtgtg 9240 
gaggaggttg ttgctggtct tggtgcggtg gcttctgggg cggtggcttc gggttcggtg 9300 
gtggtgggtt cggtggcgtc gggtgttgct ggtggtggtg gtcgggtggt gtttgtgttt 9360 
ccgggtcagg gttggcagtg ggtgggtatg ggtgcggcgc tgctggacga gtcggaggtg 9420 
ttcgccgagt cgatggtgga gtgtggtcgg gcgttgtcgg ggtttgtgga ttgggatttg 9480 
ttggaggtgg tgcgcggcgg ggcgggtgag ggggtgtggg gtcgggttga tgtggtgcag 9540 
ccggtgtcgt gggcggtgat ggtgtcgttg gcgcggttgt ggatgtcggt gggtgtggtg 9600 
ccggatgcgg tggtgggtca ttcgcagggt gaggttgctg cggcggtggt ggggggtgtg 9660 
ttgagtgtgg ctgatggggc gcgggtggtg gcgttgcggt cgcgggtgat cggtgaggtg 9720 
ttggccggtg gtggtgcgat ggtgtcggtc ggactgccga tcgtggatgt gcaggaacgg 9780 
ttggcggggt ggggtggtcg gttgggtgtg gcggcggtga atggtccgtc gttgacggtg ■ 9840 
gtgtcggggg atgtggatgc tgctgtgggg tttgttggtg agtgtgagcg ggatggggtg 9900 
tgggtgcggc gggtggcggt ggattatgcg tcgcattcgg cgcatgtgga ggcggtggag 9960 
gggatgctgt cggggttgtt gggtggtttg tgtccggggc ggggtgtggt gccgttttat 10020 
tcgtcggtgg tgggtggtgt ggttgatggg gtgggtttgg atggtgggta ttggtatcgg loOBO 
aatctgcgtg agcgggtgtt gttttcggat gtggtggggc ggcttgttgg ggatgggttt 10140 
tcggggtttg tggagtgttc ggggcatccg gtgttggcgg gtggggtgtt ggagtcggtg 10200 
gcggtggtgg atccggatgt gcggccggtg gtggtggggt cgctgcgccg tgatgatggt 10260 
gggtggggcc ggtttctgac gtcggtgggt gaggcgttcg tcggcgggat gagtgttgac 10320 
tggaagggtg tgttcgcggg ggcgggcgcg cggttggttg acctgccgac gtatccgttc 10380 
caacgacgcc actactgggc ccagacctcg cccgctggcg tcgggacggc cgcggcggcc 10440 
cggttcggca tggagtggga ggaccatccc ctgctcggcg gtgcgctgtc ggtcgggggc 10500 
tccaggagcc tgcttctggc cgggcatctg tcgctcgcct cgcacgcctg gctgaccgao " 10560 
catgccgtct ccggcaccgt gctgctgccc ggtacggcct tcgtggaact cgccctgcac 10620 
gccgccgctg cggotggctg tccggaggtc gaggagctgc gofctggaggc tcccctggtg 10680 
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gtgccggcca 


ggggcggggt 


gcggctccag 


gtgctcgtgg 


acgaccccga 


cgacggatcc 


10740 


gaccgccgcg 


cgcftaagcgt 


gttctcccgg 


gacgatgcgg 


cgccggccga 


gtccgcctgg 


10800 


acgcggcacg 


cggtgggcgt 


cctggccgcg 


cggtcgcggc 


ctgcaccggc 


tgcgccctgg 


10860 


cacaccgacg 


cctggccacc 


ttcgggcacg 


gagccggtcg 


acgtggccga 


cctgtatgag 


10920 


cggttcgcgg 


cgctgggcta 


cgagtacggg 


gaggcgttcg 


ccgggctcca 


gggggtctgg 


10980 


cggggggacg 


gcgaggtgtt 


cgccgaggtg 


cggctgcccg 


accgggtcag 


cgcggaggcc 


11040 


attcgcttcg 


ggctgcatcc 


cgcgctgctc 


gacgccgccc 


tgcaggggtg 


gttggcgggc • 


11100 


gacctcgtcg 


gcgtccccga 


gggcagtgtg 


ctgctgccct 


tcgcctggca 


gggcgtcgtg 


11160 


ctccacgcca 


ccggcgccga 


cactctgcgg 


gttcgcatcg 


gccggtccgg 


tgactcggcc 


, 11220 


gtctgcctgc 


acgcggtgga 


cccggccggt 


gctccggtcc 


tctcgttgga 


cgccctggcc 


11280 


ctgcgtccgc 


tcgtccggga 


acgcctcggg 


ctgcccgccg 


atgccggagc 


cggggcgttg 


11340 


taccgggtcg 


gctggcggcg 


gcaggccgcc 


gttgccgggg 


cagccgaccg 


gcggtgggcg 


11400 


gtcgtggccc 


cgaacggtgc 


cgaggcggac 


ggggccgccg 


agccgcaccg 


gtggccggtc 


11460 


gccgccgtcg 


acgtgcacac 


cgacgtggac 


tcgctgcggg 


cggccctgga 


cgcgggcgcg 


11520 


gaactgcccg 


ccgtcgtcct 


cgccgacttc 


cggagggccg 


ccggctggag 


cgtcgacagt 


11580 


tcgctggccg 


ccggcccgtc 


gcccaacgac 


ggcgcggtgg 


gcgacggcgc 


ggtgggcgac 


11640 


gcccgggccg 


gggccgtccg 


ggcggcgacc 


cgggccgggc 


tggatctgct 


gcaacgctgg 


11700 


ctggccgacg 


agcggttcat 


cgcggccagg 


ctcgtggtgg 


tcaccgaacg 


ggccgtggcc 


11760 


gccgggccgg 


acgaggacgt 


gccgggcctc 


gtccacgcgg 


gactgtgggg 


cctgctccgg 


11820 


tcggcccaat 


cggagcaccc 


ggaccgcttc 


gtgctggtgg 


acgtcgacgc 


ggacgacagc 


11880 


tcgctcgcgg 


cgctgccgtc 


ggccctcgcc 


atggacgcgc 


cccaactggt 


ggtgcgggcc 


11940 


ggtcagatcc 


tgctgcccga 


gatcgagccg 


gtgcggcccg 


tacccgagcc 


ggagcaggcg 


12000 


gaacccgaac 


cgggggccgt 


cctggacccc 


gacggcacgg 


tcctgctcac 


cggcgcgacc 


12060 


ggcacgctcg 


gcgggctgct 


cgcccggcac 


ctggtgacca 


cccgtggtgc 


gcgccggctg 


12120 


ctgctggtca 


gccgcagcgg 


tccggacgcc 


cccgatgccg 


gccggctgac 


cgaggagctg 


12180 


accgggctcg 


gcgcccacgt 


gacgctggcc 


gcctgcgaca 


ccacggatcg 


cgccgcgctg 


12240 


gccggcgtcc 


tgggcggcat- 


ccccgccgag 


catccgctga 


ccgccgtggt 


gcacgtggcc 


12300 


ggcgtactcg 


acgacggggc 


ggtgcaggcg 


ctcacccccg 


agcgggtcga 


cgcggtgctc 


12360 
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cggccgaagg 


tggacgcggc 


actgcacctg 


cacgaactga 


ccgcggggct 


gccgctggcc 


12420 


gcgttcgtgc 


tgttctccgg 


ggcggcgggg 


atcctgggcc 


ggcccggcca 


ggccaactac 


12480 


gcggcggcga 


acaccttcct 


ggacgccctg 


gcgcagcacc 


gacgggcccg 


gggcctgccc 


12540 


ggcgtctccc 


tcgcctgggg 


cctgtggggg 


ctggccagcg 


acatgacggg 


ccacctgggc 


12600 


gagcaggacc 


tgcggcggat 


gcggcgctcc 


ggcatcgccc 


cgatgaccgg 


cgaggagggc 


12660 


ctcgcgctgt 


tcgacctggc 


cctcgacctg 


gcccgggacg 


aaccggtgct 


cgtaccggcc 


12720 


cgactggacc 


cggcggcgct 


gcgccgggag 


tgggccgcca 


acggaccggg 


cgccgtcccg 


12780 


gtcctgctgc 


ggggtctggt 


gccggcggct 


ccgctccgtc 


gcgcggcccc 


gtcgggcgcc 


12840 


gccggcggtg 


cgcccgtgcc 


cgccgtcgcc 


gcgccgcagc 


aggcggacga 


gctgcgcggg 


12900 


caactggccg 


ggaaggacgc 


gcaggcccag 


gtccggcagc 


tgctggatct 


ggtacgcgcc 


12960 


catgtcgccg 


gggtgctcgc 


cctccgggaa 


gcggcggacg 


tggacccggg 


cagaccgttc 


13020 


cgcgaggtcg 


gattcgactc 


gttgaccgca 


gtcgaactgc 


gcaaccggct 


gggctcggcg . 


13080 


accggcctgc 


ggttggcacc 


gagcctggtg 


ttcgaccatc 


cgaccccgtc 


ggccgtggcc 


13140 


gagcacctcg 


tggaccgcct 


cgccgccgag 


ggggcggctg 


acgagggcgc 


ggcggcactg 


13200 


accgggctcg 


acgcagtggc 


cgcggcgctc 


ggcgggatgc 


ggacggacga 


cgttcgccgg 


13260 


gacatcgtcc 


gcaggcggct 


ggaggagatg 


ctcgccctgg 


tcggcgggcc 


acggtccggg 


13320 


ccggcaggtg 


acgggctggt 


ggatgccacg 


gtcgccgagc 


gactggactc 


ggcttccgac 


13380 


gacgaactct 


tcgccctgat 


cgaggagcag 


ctgtga 






13416 



<210> 12 
<211> 1925 
<212> PRT . 

<213> micromonospora carbonacea sijibspecies aurantiaca 
<400> 12 

Val Thr Ala Asn Glu Asp Arg Met Arg Glu Tyr Leu Lys Arg Val Thr 
15 10 15 

Ala Glu Leu Ala Gly Thr Arg Arg Arg Leu Arg Glu Leu Glu Asp Ser 
20 25 30. 

Ala Arg Glu Pro He Ala He Val Gly Met Ser Cys Arg Leu Pro Gly 
35 40 45 

Gly Val Ser Thr Pro Glu Asp Le.u Trp Arg Leu Val Glu Ala Gly Thr 
50 55 60 

Asp Ala He Ser Gly Phe Pro Asp Asp Arg Gly Trp Asp Val Gly Arg 
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65 



70 



75 



80 



Leu Tyr Asp Pro Asp Pro Asp Ser Thr Gly Thr Ser Tyr Val Arg Glu 
85 90 95 

Gly Gly Phe Leu Tyr Asp Cys Ala Glu Phe Asp Pro Glu Phe Phe Thr 
100 105 110 

Val Ser Pro Arg Glu Ala Leu Ala Met Asp Pro Gin Gin Arg Leu Leu 
115 120 125 

Leu Glu Ala Ala Trp Glu Thr Phe Glu Arg Ala Gly He Ala Pro Asp 



Asp Tyr Gly Ser Arg Leu Ser Glu Val Pro Lys Asp Leu Glu Gly Tyr 
165 170 175 

Leu Val Asn Gly Ser Ala Gly Ser Val Ala Ser Gly Arg He Ala Tyr 
180 185 190 

Thr Leu Gly Leu Gin Gly Pro Ala Val Thr Val Asp Thr Ala Cys Ser 
195 200 205 

Ser Ser Leu Val Ala Leu His Leu Ala Val Gin Ala Leu Arg Ser Gly 
210 215 220 

Glu Cys Glu Leu Ala Leu Ala Gly Gly Ala Thr Val Leu Ala Thr Pro 
225 230 235 240 

Thr Met Phe Val Asp Phe Ala Arg Gin Arg Gly Leu Ala Glu Asp Gly 
245 250 255 

Arg Cys Lys Ala Phe Ala Asp Ala Ala Asp Gly Thr Gly Phe Gly Glu 
260 265 270 

Gly Val Gly Met Leu- Leu Val Glu Arg Leu Ser Asp Ala Val Arg Asn 
275 280 285 

Arg Arg Gin Val Leu Ala Val Val Arg Gly Ser Ala Val Asn Gin Asp 
290 295 300 



Gly Ala Ser Asn Gly Leu Thr Ala Pro Asn* Gly Tlir Ala Gin Gin Leu 
305 310 315 320 

Val He Arg Gin Ala Leu Thr Asn Ala Gly Leu Ala Ala Asp Glu Val 
325 330 335 

Asp Ala Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly Asp Pro He 
340 345 350 

Glu Ala Gin Ala Leii Leu Ala Thr Tyr Gly Gin Gly Arg Pro Ala Asp 
355 360 365 

Arg Pro Leu Leu Leu Gly Ser Leu Lys Ser Asn He Gly His Thr Gin 



130 



135 



140 



Ser Ala Arg 
145 



Gly Thr Arg Thr Gly Val Tyr Val Gly Val Met Tyr Asp 
150' 155 160 
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370 



375 



380 



Ala Ala Ala Gly Val Ala Gly Val He Lys Thr Val Leu Ala Leu Arg 
385 390 ^95 400 

His Ala Arg Leu Pro Arg Thr Leu His Val Asp Arg Pro Ser Thr Arg 
405 410 415 

Val Asp Trp Ser Ser Gly Ala Val Arg Leu Leu Thr Glu Gly Arg Pro 
420 425 430 

Trp Pro Asp His Gly Asp Arg Pro Arg Arg Ala Gly Val Ser Ser Phe 
435 440 445 

Gly Ala Ser Gly Thr Asn Ala His Val He Leu Glu Ser Ala Pro Gly 
450 455 460 

Ala Ala Ala Gly Ala Thr Gly Ala Thr Asp Leu Ser Ala Pro Pro Ala 
465 470 475 480 

Ser Val Ala His His Pro Ala Thr Ala Thr Ala Thr Ala Pro Ala Ala 
485 490 495 

Thr Val Pro Thr Ala His Glu Pro Ala Gly Thr Ala Gly Asp Asp Pro 
500 505 510 

Val Trp Val Leu Ser Gly Arg Thr Glu Ala Ala Leu Arg Glu Gin Ala 
515 520 525 

Arg Arg Leu His Ala His Leu Thr Ser Arg Ala Arg Pro Glu Pro Ala 
530 535 540 

Asp Ala Val Ala Arg Ala Leu Ala Arg Ser Arg Thr Ala Phe Ala Tyr 
545 550 555 560 

Arg Ala Ala Val Leu Gly Arg Asp Asp Thr Ala Arg Leu Asp Gly Leu 



His Ala Leu Ala Ala Gly Arg Ser Ala Ala Gly Leu Val Thr Gly Arg 
580 585 590 

Ala Val Pro Glu Arg Arg Val Ala Phe Leu Phe Thr Gly Gin Gly Ser 
595 600 605 

Gin Arg Pro Gly Ala Gly Arg Glu Leu Tyr Ala Arg His Pro Ala Phe 

610 615 620 

Ala Gin Ala Leu Asp Gly Val Leu Ala Glu Leu Asp Arg His Leu Asp 
625 630 635 640 

Arg Pro Leu Arg Ala Val Met Leu Ala Glu Pro Gly Thr Glu Ala Ala 
645 650 655 

Ala Leu Leu Asp Asp Thr Ala Tyr Thr Gin Pro Ala Leu Phe Ala Leu 
660 665 670 

Glu Val Ala Leu Phe Arg Leu Val Thr Ser Trp Gly Leu Arg Pro Asp 



565 



570 



575 
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675' 



680 



685 



Ala Leu Leu Gly His Ser Val Gly Glu lie Thr Ala Ala Tyx Val Ala 
690 695 700 

Gly Val lieu Thr Leu Pro Asp Ala Ala Arg Leu Val Ala Val Arg Gly 
.705 710 715 720 

Arg Leu Met Ala Asp Leu Arg Ala Gly Gly Ala Met Ala Ala Leu Gin 
725 730 735 

Ala Ala Glu Ser Glu Val Asp Pro Leu Leu Ala Gly Arg Glu Gly Glu 
740 745 750 

Leu Ser lie Ala Ala Val Asn Gly Pro Gin Ala Thr Val lie Ala Gly 
755 760 765 

Asp Glu Ala Ala Val Glu Glu Gin Val Ala Leu Trp Arg Asp Arg Gly 
770 775 780 

Arg Arg Ala Arg Arg Leu Arg Val Gly His Ala Phe His Ser Val Arg 
785 790 795 800 

Met Asp Gly Met Leu Ala Glu Phe Glu Lys Ala Met Gly Asp Leu Arg 
805 810 815 

Ala Gly Glu Pro Thr lie Pro Val Val Ala Asn Val Arg Gly Ala He 
820 825 830 

Ala Ser Gly Thr Asp Leu Arg Thr Ala Gly Tyr Trp He Arg His Ala 
835 840 845 

Arg Glu Pro Val Arg Phe Leu Asp Gly Met Arg Ala Leu Arg Ala Glu 
850 . 855 860 

Gly Val Asp Thr Phe Val Glu Leu Gly Pro Asp Gly Val Leu Thr Ala 
865 870 875 880 

Met Ala Arg Asp Cys Leu Ala Asp Pro Ala Asp Pro Val Asp Leu Ala 



Phe Leu Pro Thr Leu Arg Arg Asp Arg Asp Asp Ala Val Ala Val Arg 
915 920 925 

Glu Ala Leu Ala Ser Val His Val His Gly Leu Pro Val Asp Pro Val 
930 935 940 

Ala Pro Leu Gly Asp Gly Pro Leu Ala Thr Asp Leu Pro Thr Tyr Pro 
945 950 955 960 

Phe Gin Arg Ser Arg Tyr Trp Leu Asp Pro Arg Pro Gly Ala Arg Asp 
965 970 975 

Leu Thr Ala Val Gly Leu Asp Val Ala Gly His Pro Leu Leu Ala Val 



885 



890 



895 



Asp Ala Ala Glu Pro 
900 



Ala Gly Ala Ala Glu Pro Asp Arg Ser Leu Leu 
905 910 . 
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980 985 990 

Ala Val Asp Leu Pro Asp Gly Ala Gly Thr Val Trp Ser Gly Gin Leu 
995 1000 1005 



Cys Val Arg Thr His Pro Trp Leu Ala Asp His Ser Val Trp Gly 
1010 1015 1020 

Arg Thr Val Val Pro Gly Thr Ala Leu Leu Glu lie Met His Arg 
1025 1030 ■ 1035 

Val Arg Ala Glu Val Gly Cys Thr Arg Val Ala Glu Leu Thr Phe 
1040 1045 1050 

Glu Ala Pro Met Val Leu Ala Asp Asp Gly Gly Val Arg Val Arg 
1055 1060 1065 



Val Val Val Asp Gly Pro Asp Ala Asp Gly Ala Arg Gin Val Arg 
1070 1075 1080 

lie His Ser Ala Pro Val Gly * Pro Glu Pro Pro His Trp Thr Arg 
1085 1090 1095 

His Ala Ser Gly Arg Val Asp Ser Ala Ala Pro Gly Pro Ala Ala 
1100 1105 • 1110 

Gly Pro Pro Ala Trp Asp Ala Gly Pro Gly Ser Asn Trp Pro Pro 
1115 1120 1125 

Glu Gly Ala Glu Pro Val Gly Val Glu Ser Glu Tyr Glu Arg Phe 
1130 1135 1140 



Ala Asp Asn Gly lie Gly Tyr Gly Pro Ala Phe Arg Gly Leu Arg 
1145 1150 1155 

Ala Ala Trp Arg Arg Gly Asn Glu Thr Phe Ala Glu Val Arg Leu 
1160 1165 1170 

Pro Glu Gly Tyr Ala Ala Glu Ala, Gly Asp Tyr Ala Val His Pro 
1175 1180 1185 

Ala Leu Leu Asp Ala Ala Leu His Ala lie Val Phe Gly Asp Gin 
1190 1195 1200 

Phe Pro Gly Gly Ala His Gly Met Leu Pro Phe Ala Phe Thr Asp 
1205 1210 1215 

Val Arg Val Phe Ser Ser Gly Ala Asp Arg Leu Arg Val Arg lie 
1220 1225 1230 

Ala Pro Ala Asp TQa Asp Ser Val Cys Val Thr Val Ala Asp Gly 
1235 1240 1245 

Asp Gly Thr Pro Val Leu Ala Ala Ala Thr Leu Ala Leu Arg Arg 
1250 1255 1260 

Val Ala Ala Asp Arg He Ala Ala Thr Val Thr Gly Gin Ala Pro 
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1265 1270 1275 

Leu Tyr Arg Leu Glu Trp Ser Ala Val Arg Pro Ala Pro Val Ala 
1280 1285 1290 

Thr Gly Ala Arg Phe Ala Val Val Gly Ala Asp Ala Pro Leu Pro 
1295 1300 1305 

Ser Gly Ala Leu Gly Ala Gly Val Pro Val Gin Ala Tyr Pro Asp 
1310 1315 1320 

Leu Gly Ala Leu Ala Gly Ala Leu Ala Thr Asn Gly Ala Pro Gly 
1325 1330 1335 

His Val Leu Val Asp Phe Arg Arg Arg Ala Asp Gly Pro Ala Gly 
1340 1345 1350 

Arg Gin Pro Gly Asp Val Gly Ala Arg Thr Arg Arg Ala Leu Ala 
1355 1360 1365 

Val Val Gin Glu Trp Leu Ala Asp Asp Arg Phe Thr Gly Ser Arg 
1370 1375 1380 

Leu Val Val Leu Thr Ser Gly Ala Val A3p Ala Gly Thr Ala Val 
1385 1390 1395 

Thr Asp Pro Ala Ala Ala Gly Val Trp Gly Leu Leu Arg Val Ala 
1400 1405 1410 

Gin Thr Glu His Pro Asp Arg Phe Val Leu Val Asp Thr Asp Asp 
1415 1420 1425 • 

His Pro Asp Ser Leu Arg Ala Leu Pro Gly Ala He Val Ala Gly 
1430 1435 1440 

Glu Pro Gin Leu Ala Leu Arg Ala Gly Thr Ala Ser Val Pro Gly 
1445 1450 1455 

Leu Val Arg Val Pro Ala Gly Thr Gly Ala Ala Pro Pro Trp Ala 
1460 1465 1470 

Ala Ala Gly Thr Val Leu Val Thr Gly Gly Thr Gly Met Leu Gly 
1475 1480 ' 1485 

Gly Ala Val Ala Arg His Leu Val Arg Arg His Gly Val -Arg Arg 

1490 1495 1500 

Leu Leu Leu Val Gly Arg Arg Gly Pro Asp Ala Pro Gly Ala Ala 
1505 1510 1515 

Ala Leu Thr Arg Glu Leu Glu Glu Leu Gly Ala Ser Val Arg Val 
1520 1525 1530 

Ala Ala Cys Asp Val Gly Asp Arg Gly Ala Val Thr Arg Leu Leu 
1535 1540 1545 

Ala Gly Val Pro Ala Ala His Pro Leu Thr Ala Val Val His Ser 



- 74 - 



wo 03/010193 



PCT/CA02/01177 



1550 1555 



1560 



Ala Gly Leu Pro Asp Asp Gly Val Leu Thr Ala Gin Thr Gly Glu 
1565 1570 1575 

Arg Val Ala Ala Val Leu Arg Ala Lys Ala Asp Ala Ala Val Asn 
1580 1585 1590 

Leu His Glu Leu Thr Arg His Leu Asp Leu Thr Ala Phe Val Leu 
1595 1600 1605 

Phe Ser Ser Val Ala Gly Thr lie Gly Ser Ala Gly Gin Ala Gly 
1^10 1615 1620 

Tyr Ala Ala Ala Asn Ala Phe Leu Asp Ala Phe Ala Ser Trp Arg 
1^25 1630 1635 

Gin Gly Gin Gly Leu Pro Ala Thr Ala Leu Ala Trp Gly Pro Leu 
1^40 1645 1650 

Asp Gly Gly Met Ala Ala Gly Leu Gly Thr Ala Asp Val Ala Arg 
1655 1660 1665 

Leu Arg Arg Ser Gly Leu Val Pro Leu Gly Val Asp Asp Ala Leu 
1670 1675 1680 

Val Leu Phe Asp Ala Ala Cys Ser Arg Pro Ala Ala Ala Tyr His 
1685 1690 1695 

Pro Val Arg Leu Asp Pro Ala Val Leu Arg Ser His Ala Ala Ala 
1700 1705 1710 

Asp Ser Ala Val Pro Ala Val Leu Leu Gly Pro Ser Arg Ala His 
1715 1720 1725 

Pro Arg Asp Gly Thr Pro Gly Lys Pro Ala Glu Ala Ala Leu Ala 
1730 1735 1740 

Ala Leu Leu Thr Gly Arg Ser Ala Ala Glu Arg Thr Ala He Leu 
1745 1750 1755 

Thr Asp Leu Val Arg Thr Glu Ala Ala Ala Val Leu Gly His Gly 
1760 1765 1770 

Glu Ala Ala Met Leu Ser Thr Gin Arg Ala Phe Arg Asp Ala Gly 
1775 1780 1785 

Phe Asp Ser Leu Thr Ala Val Asp Leu Arg Asn Arg Leu Gly Ala 
1790 1795 1800 

Ala Thr Gly Leu Ser Leu Pro Ala Ala Val Val Phe Asp His Pro 
1805 1810 1815 

Thr Pro Ala Ala Leu Ala Ala Tyr Leu Arg Thr Glu Leu Asp Arg 
1820 1825 1830 

Arg Ser Pro Thr Gly Gin Gin Phe Pro Thr Asp Ala Ala Gly Val 
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1835 1840 1845 

Leu Ala Met Leu Asp Arg Leu Arg Asp Gly He Ala Thr Val Val 
1850 1855 I860 

Arg Asp Asp Ala Asp Arg Thr Arg Ala Ala Asp Leu Leu Arg Val 
1865 1870 1875 

Leu Leu Ala Glu Val Gly Gly Pro Gly Thr Gly Pro Pro Arg Asp 
1880 1885 1890 

Thr Asp Gly Gly Ser Gly Gly Glu Val Ser Asp Arg Leu Arg Thr 
1895 1900 1905 

Ala . Ser Asp Glu Glu Leu Phe Asp Leu Leu Asp Ser Asp Phe Arg 
1910 1915 1920 

Leu Ala 
1925 

<210> 13 
<211> 5778 



<212> DNA 



<213> micromonospora 


carbonacea 


subspecies 


aurantiaca 




<400> 13 
gtgaccgcga 


acgaggaccg 


gatgcgtgag 


tacctcaagc 


gggtcaccgc cgagctggcc 


60 


gggacgcggc 


gacgcctgcg 


G aacTC t QcraQ 


qacacfCQCQc 

^ M V» Vi%<^ W 


gtgagcccat cgcgatcgtg 


120 


ggcatgagct 


gccggttgcc 


gggcggggtg 


agcacgcccg 


aggacctgtg gcggctggtc 


180 


gaggccggta 


ccgacgcgat 


ctccggcttc 


cccgacgacc 


ggggctggga tgtcgggagg 


240 


ctctacgacc 


cggatccgga 


ctcgaccgga 


acgagctacg 


tgcgcgaggg cggcttcctc 


300 


tacgactgcg 


ccgagttcga 


cccggagttc 


ttcaccgtct 


cgccccgcga ggcgctggcc 


360 


atggacccgc 


agcagcggct 


gctgctggag 


gccgcctggg 


agaccttcga acgggcgggg 


420 


atcgcccccg 


actcggcccg 


cggcacccgc 


accggggtct 


acgtcggggt gatgtacgac 


480 


gactacggca 


gccggctgtc 


ggaggtgccg 


aaggacctgg 


agggctacct ggtcaacggc • 


540 


agcgcgggca 


gtgtcgcgtc 


gggccggatc 


gcgtacacgc 


tggggttgca ggggccggcg 


600 


gtgacggtcg 


acacggcctg 


ctcgtcgtcg 


ctggtcgcgt 


tgcacctggc cgtgcaggcg 


660 


ctgcggtcgg 


gcgagtgtga 


gctggccctg 


gcgggcgggg 


cgacggtgct cgccacgccg 


720 


acgatgttcg 


tcgacttcgc 


ccggcagcgc 


ggtctcgccg 


aggacggccg ttgcaaggcg 


780 


ttcgcggacg 


ccgccgacgg 


gaccgggttc 


ggcgagggcg 


tggggatgct gctggtggaa 


840 


cggctctcgg 


acgcggtccg 


caaccgtcgc 


caggtgctgg 


ccgtcgtgcg gggcagcgcg 


900 


gtcaaccagg 


acggggcgag 


caacggcctg 


accgccccga 


acggtacggc ccagcaactg 


960 
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gtcatccggc 


aggcgttgac 


caacgcgggg 


ctggccgcgg 


acgaggtgga 


cgcggtggag 


1020 


gcacacggca 


ccggcacccg 


gctgggcgat 


ccgatcgagg 


cgcaggcgct 


gctggcgacg 


1080 


tacggccagg 


gccggccggc 


ggaccggccg 


ctcctgctgg 


gatccctgaa 


gtccaacatc 


1140 


ggccacaccc 


aggccgccgc 


aggggtcgcc 


ggggtgatca 


agaccgtgct 


ggcgctgcgt 


1200 


cacgcgcggc 


tgccccggac 


cctgcacgtc 


gatcgcccct 


cgacccgggt 


ggactggtcg 


1260 


tcgggcgcgg 


tgcggctgct 


gaccgagggg 


cggccctggc 


ccgatcacgg 


cgaccggccc 


1320 


cgccgggccg 


gggtctcctc 


gttcggcgcg 


agcggcacca 


acgcgcacgt 


catcctggag . 


1380 


agcgcccccg 


gtgcggcggc 


gggggcgacc 


ggggcgacgg 


acctctcggc 


cccgccggca 


1440 


tccgtcgccc 


accatccggc 


cacggccacg 


gccacggccc 


cggcggcgac 


ggtgcccact 


1500 


gcccacgaac 


cggcg^ggac 


ggccggcgac 


gaccccgtct 


gggtcctgtc 


cggccggacc 


1560 


gaggcggccc 


tgcgcgagca 


ggcccggcgg 


ctacacgccc 


acctgacatc 


ccgggcgcgg 


1620 


cccgagcccg 


ccgacgccgt 


ggcccgcgcg 


ctggcgcgct 


cccgcaccgc 


gttcgcgtac 


1680 


cgggccgccg 


tgctgggccg 


ggacgacacc 


gcgcggctcg 


acggcctcca 


cgcgctcgcg 


1740 


gcgggtcgca 


gcgccgcggg 


gctcgtcacc 


gggcgggccg 


tgccggagcg 


gcgcgtggcc 


1800 


ttcctcttca 


ccgggcaggg 


cagccagcga 


ccgggcgcgg 


gccgggaact 


gtacgcccgg 


1860 


catcccgcct 


tcgcacaggc 


cctggacggc 


gtcctcgcgg 


aactcgaccg 


gcacctggac 


1920 


cggccgctgc 


gcgccgtcat 


gctcgccgag 


ccgggcaccg 


aggcggcggc 


gctgctggac . 


1980 


gacaccgcgt 


acacccagcc 


cgccctgttc 


gcgctggagg 


tggcgctgtt 


ccggctggtc 


2040 


acgagctggg 


ggctgcggcc 


tgacgccctg 


ctgggccact 


cggtcgggga 


gatcaccgcg 


2100 


gcgtacgtcg 


cgggcgtcct 


caccctgccg 


gacgccgccc 


ggctggtggc 


ggtgcgcggt 


2160 


cgactcatgg 


cggacctgcg 


ggccggcggt 


gcgatggccg 


cgctccaggc 


cgccgagagc 


2220 


gaggtcgacc 


ccctgttggc 


ggggcgggag 


ggcgaactgt 


cgatcgcagc 


ggtcaacggg 


2280 


ccgcaggcaa 


ccgtgatcgc 

> 


gggcgacgag 


gcggccgtcg 


aggagcaggt 


cgcgctgtgg 


2340 


cgtgaccggg 


gtcgccgggc 


caggcgactg 


cgggtcggcc 


acgccttcca 


ctccgtacgg 


2400 


ctuggacggga 




/-»+* 4- ^rYa rra a rr 
g U C» acl^ 


gcgacgggtg 




wy y u-y ay u cy 




acgatccccg 


tggtcgccaa 


cgtcaggggg 


gcgatcgcgt 


ccggcaccga 


cctccgtacg 


2520 


gccgggtact 


ggatccggca 


cgcccgcgag 


ccggtgcgtt 


tcctcgacgg 


catgcgtgcg 


2580 


ctgcgggccg 


agggcgtcga 


cacgttcgtg 


gaactcggcc 


ccgacggagt 


gctcacggcg 


2640 
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a. uy y uy \,»y 


eii^ L>y ^yy ^ 


y y d u ih> ^y ^ 


era rr'pacit' crcr 
y dVi^^^yy i^y y 


ci b v« i« wy ^y y A 


POP POP ccracf 


2700 


ccv-yv^uyyyy 


^oy wyyciy 


^y d V* Wi>y ^ w 


^ ^y w ^y ^ w 


tcrcppapppt" 


yeyLo^yyy aw 


2760 




vvCiy L«yy^^y i> 


yyy dyy 


^ uy y ^ci L>\>^ v^y 


t"pcacat'aP3 


paaac tt ccc 


2820 


y t cy auccy y 


u i^y u>y \«>t>^ y ^ i« 


cyycyduyyL. 


v,.iM>y u^y ^.vwA 


ppoappirrpp 




2880 


uw wdy t«yy L. 


u>L«wy oucto L>y 


y ^ uwy ditf^oi^y 


t'y »-*-*-^yy y y 


w Q ^y w y CLv^ w 


aappaPCCftQ 


2940 


yy ut^y cti^y 


v-yyccyyy t-o 


i^^^y^ i.yu>c^ 


y la^wy i-v*y ^-^^y 


t" crcfa P P t* Cf P P 




3000 


yy i^ct^yy l.^ u 




y u L L, uy L.y uy 


y CL^y \>^ci l« 


pert" CTCTP t" pap 
v»y ^y y e ^ey *^ 


paaccacacTC 


3060 


gcgcgggggc 


gcacggcggu 


9 c cggy y ac u 


ycgcuycuyy 


ana f~ r^a frroa 
dyducdcy cd 


ev^ydy L>y v<>y e 


3120 


gc cy aggtyy 


ycuycauccy 


y y u cy cy y dd 


c uy dec u u cy 


a crapoppcra ^ 
dy y ey e i^y ci u> 


aat" ap t* acrcp 
yy i-y e uyy 


3180 


gacyacggyy 


y cy u ccy wy u 


ycgygucy LC 


y ccydcyydc 


r«arra Prtppfra 
cdy dcy ^wyci 


ey y y y *«»y 'x* 


3240 


Cayy uCCyyci 


ucco-ci^ccyc 


dC cy y t y y yy 


cccydy cc i»c 


r^r<r»a r^^crcfa p 
ei^Ufdiarf L>yy de 


eeyy OAoy w w 


33 op 


ucyyy ccy L.y 


4- /~i<Ta r*^cxt^c^^ 

L. L.y dcay wy 


cycgccgggg 


ccyyccy ev-'y 


fTP p p a p rTTf p 
y LrCioev^y e 


y ^yyydv,*y ww 


3360 


yyoL>ot.yyua 


yc^dav^ uy y 


ycccgagggg 


ycyydyccyy 


t* Cf era pert" paa 


aaapaaatac 


3420 


yayoycuu^y 


^/-i/~ra /**a a f^ncx 
u>u>y dLodd^yy 


v.* d u iii*y y d u d\^ 


orofpp p pcf p p t" 


^ p paa acsan t* 

eeey ciy yy w ^ 


y 


3480 


^yycytuyCy 


y y del cy dy dC 


y uucy(.«v.<ydy 


y cccyy c ucc 


ccy dyyyy ud 


pappoppciaa 

wy wV^y wwy CLy 


3540 


yCyggcya.cu 


dCy ccy uccd 


uccyycdcuy 


cuggdcgcgg 


ccc L»y L^dcyc 


era h poh P^ ^ P ' 
yducyueiwi^e 


3600 


ggugaccagt. 


uucccyy uyy 


yycdCdcyyy 


dugci^yccy u 


ccy cccccdc 


w«yci\^y i-y *^yy 


3660 


y^y i^ucaycu 


ccyycyccyd 


ccyy c u i^wy y 


gcycycdccy 


wy ee^Miy v^oy d 


"happaapi'pa 


3720 


y ucuycy uyA 


ucy uuyccyd 


cggcgacggg 


dcyccyy i..we 


f" rrtr'PfTPaPiP 
i« wy w iM>y dy 


pa p p p "t" acfpo 
wo v-> ^ w u.y y lany 


3780 


t uy cgc egg g 


ucyccyccyd 


c c yy d t. cy cy 


y cy dccy ucd 


ccy y ^*cdyy \.* 


a opan'^ at" a p 


3840 


cggu ty y ay u 


yy UQ^'Oyov^y u 


y L^y y ^v^v^y 


ceyy uyy 


ccyyyycycy 


at't*pappCTt*p 

y L> L»^y ^wy 


3900 


g u cggcgcgg 


dcgccccgct. 


yccguccgy u 


gcgccggggg 


ccyyyy uy cc 


cy uccdyy cy 




uaCCCyyaCw 




yyccyycycy 


t* h crcrp p a p pa 


a paoocrpa p c 
dVi»y y y y ecn->e 


yyy ww»v*y uy 


4020 


cu^y ucyowu 


u \« ^ ^y ^ oy 


u>y ^ ^y d^y y !■.* 


ccyycdyyyc 


aac aac c eaa 

yy eay ^^wyy 


t aa Got acrot 
^y ciwy ^y yy ^ 


4080 


ycacyyaCwC 


yacyyycy^u 


nnr* f^ex^ r^n^ 

yy ccyv-cy uc 


p a prra cr t" oar* 
cdyydy ^yy ^ 


\^wy w^y cLwy c& 


ppai" t* 1" papp 


4140 


ggcuCacgyc 


uyy ucy uycu 


Cdccagcgyd 


gccgcggdcy 


ccyy ddv^dy c 


L^y L>edccy dVo 


4200 


ccggccgccg 


ccggggtgtg 


gggcctgctg 


cgggtcgccc 


agaccgagca 


tccggaccgg 


4260 


ttcgtcctcg 


tggacaccga 


cgaccacccg 


gattcgctgc 


gtgccctccc 


cggggcgatc 


4320 


gttgcgggcg 


agccgcagct 


ggcactgcgg 


gccggcacgg 


ccagcgttcc 


gggcctggtg 


4380 
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cgggtgccgg ccggcaccgg tgccgccccg ccgtgggccg cagccggcac cgtcctcgtc 4440 

accgggggca ccggcatgct cggcggcgcg gtggcccggc acctggtccg ccggcacggg 4500 

gtccgccgcc tgctgctggt cggccggcgc gggccggacg cacccggcgc ggcggccctg 4560 

acccgggaac tggaggagct gggagcgtcc gtccgcgtcg ccgcctgcga cgtcggcgat 4620 

cgtggcgcgg tgacgcgcct gttggccggg gttcccgccg cgcatccgct caccgcggtg 4680 

gtgcactcgg ccggcctgcc cgacgacggc gtgctgaccg cacagaccgg cgagcgggtc 4740 

gcggcggtgc tccgcgccaa ggcggacgca gcggtcaacc tgcacgaact cacccggcat 4800 

ctcgacctca ccgccttcgt gctgttctcg tcggtagcgg ggacgatcgg cagcgccggg 4860 

caggccgggt acgccgccgc gaacgccttc ctcgacgcgt tcgcgagctg gcggcagggc 4920 

caggggctgc ccgccaccgc cctggcgtgg gggccgttgg acggcgggat ggccgccggc 4980 

ctcggcactg cggacgtggc acggctgcgc cggtccgggc tcgtgccgct cggcgtggac 5040 

gacgcgctcg ttctcttcga cgccgcctgc tcccgaccgg cggcggcgta ccaccccgtc 5100 

cgcctcgatc cggcggtgct gcggtcccac gccgccgccg acagcgcggt gcccgccgtc 5160 

ctgctcggtc cgagccgtgc gcacccgagg gacggtacgc cggggaagcc tgccgaagcc 5220 

gccctcgccg cgctgctgac cggcaggtcg gcggccgagc gtacggcgat cctgaccgac . 5280 

ctggtgcgga cggaggccgc cgccgttctc gggcatggcg aggcggcgat gctgagcacg 5340 

cagcgggcct tccgcgacgc cggcttcgac tcgctcaccg ccgtggacct ccgcaaccgg 5400 

ctcggcgcgg ccacgggcct cagcctgccg gccgccgtcg tcttcgacca cccgaccccg 5460 

gcggccctgg ccgcctatct gcggaccgaa ctggaccgcc ggtcgcccac cgggcaacag 5520 

ttcccgacgg acgccgccgg tgttctggcc atgctcgacc gcctgcggga cggaatcgcg 5580 

acggtcgtca gggacgacgc cgaccggacc cgcgcagccg acctgttgcg tgtcctgctc 5640 

gccgaggtcg gcgggcccgg gacgggcccg ccccgcgaca ccgacggcgg ctccggcggc 5700 

gaggtcagcg accgcctccg gaccgcctcc gacgaggaac tgttcgacct gctcgacagc 5760 

gatttccgac tggcgtag 5778 

<210> 14 
<211> 3745 
<212> PRT 

<213> micromonospora carbonacea sxabspecies aiirantiaca 
<400> 14 
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Val Ser Val Asn Asn Glu Asp Lys Leu Arg Glu Tyr Leu Arg Arg Ala 
■1 5 10 15 

Met Ala Asp Leu His Glu Ser Arg Glu Arg Leu Arg Gin Tyr Glu Ser 
20 25 30 

Ala Ala Ala Val Asp Asp Pro Val Val Val Val Gly Met Gly Cys Arg 
35 40 45 

Phe Pro Gly Gly Val Val Cys Ala Glu Gly Leu Trp Asp Leu Val Leu 
50 . - • 55 * 60 

Gly Gly Gly Asp Ala Val Ser Gly Phe Pro Val Asp Arg Gly Trp Asp 
65 70 75 80 

Val Glu Gly Leu Phe Asp Pro Val Arg Gly Val Val Gly Lys Ser Tyr 
85 90 95 

Val Arg Glu Gly Gly Phe Val Tyr Asp Ala Gly Met Phe Asp Ala Glu 
100 105 110 

Phe Phe Gly Val Ser Pro Arg Glu Ala Val Ala Met Asp Pro Gin Gin 
115 120 125 

Arg Leu Phe Leu Glu Val Ser Trp Glu Ala Leu Glu Arg Ala Gly lie 
130 135 140 

Asp Pro Leu Gly Leu Arg Gly Ser Arg Thr Gly Val Tyr Val Gly Val 
145 .150 155 160 

Met Gly Gin Glu Tyr Gly Pro Arg Leu Val Glu Ser Gly Gly Gly Phe 
165 170 175 

Glu Gly Tyr Leu Leu Thr Gly Thr Ser Pro Ser Val Val Ser Gly Arg 
180 185 190 

Val Ser Tyr Val Leu Gly Leu Glu Gly Pro Ser lie Ser Val Asp Thr 
195 200 205 

Ala Cys Ser Ser Ser Leu Val Ala Leu His Leu Ala Cys Gin Gly Leu 
210 215 220 

Arg Leu Gly Glu Cys Asp Val Ala Leu Ala Gly Gly Val Thr Val He 
225 230 235 240 

Ala Ala Pro Gly Leu Phe Val Glu Phe Ser Arg Gin Gly Gly Leu Ser 
245 250 255 

Gly Asp Gly Arg Cys Arg Ala Phe Ala Gly Gly Ala Asp Gly Thr Gly 
260 265 270 

Trp Gly Glu Gly Ala Gly Val Val Val Leu Glu Arg Leu Ser Val Ala 
275 280 285 



Arg Glu Arg Gly His Arg Val Leu Ala Val Val Arg Gly Ser Ala Val 
290 295 300 
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Asn Gin Asp Gly Gly Ser Asn Gly Leu Thr Ala Pro Ser Gly Val Ala 
305 310 315 320 

Gin Arg Arg Val lie Gly Ala Ala Leu Val Ala Ala Gly Leu Gly Val 
325 330 335 

Ser Asp Val Asp Val Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly 
340 345 350 

Asp Pro lie Glu Ala Glu Ala Leu Leu Gly Ser Tyr Gly Arg Gly Arg 

355 360 365 

Val Gly Gly Ala Leu Leu Leu Gly Ser Val Lys Ser Asn lie Gly His 
370 375 380 

Thr Gin Ala Ala Ala Gly Val Ala Gly Val He Lys Met Val Met Ala 
385 390 395 400 

Leu Arg Ala Gly Val Val Pro Ala Thr Leu His Val Asp Val Pro Ser 
405 410 415 

Pro Leu Val Asp Trp Ser Ser Gly Gly Val Glu Leu Val Thr Glu Ala 
420 425 430 

Arg Asp Trp Pro Val Val Gly Arg Val Arg Arg Ala Gly Val Ser Ala 
435 440 445 

Phe Gly Val Ser Gly Thr Asn Ala His Leu He Leu Glu Gin Ala Pro 
450 455 460 

Glu Phe Asp Asp Pro Ala Asp Ser Asp Ser Asp Ser Asp Ser Asp Ala 
465 470 475 480 

Gly Val Val Asp Gly Gly Glu Gly Gly Val Gly Arg Ser Leu Ser Val 
485 490 495 

Val Pro Val Val Val Ser Gly Arg Ser Val Gly Ala Leu Arg Ala Tyr 
500 505 510 

Ala Gly Arg Leu Arg Glu Val Cys Ala Gly Leu Ser Asp Gly Gly Gly 
515 520 525 

Ser Gly Gly Gly Ser Gly Leu Val Asp Val Gly Trp Ser Leu Val Ser 
530 535 540 

Ser Arg Ser Val Phe Glu His Arg Ala Val Val Phe Gly Gly Gly Val 
545 550 . 555 560 

Glu Glu Val Val Ala Gly Leu Gly Ala Val Ala Ser Gly Ala Val Ala 
565 570 575 

Ser Gly Ser Val Val Val Gly Ser Val Ala Ser Gly Val Ala Gly Gly 
580 585 590 

Gly Gly Arg Val Val Phe Val Phe Pro Gly Gin Gly Trp Gin Trp Val 
595 600 605 
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Gly Met Gly Ala Ala Leu Leu Asp Glu Ser Glu Val Phe Ala Glu Ser 
610 615 620 

Met Val Glu Cys Gly Arg Ala Leu Ser Gly Phe Val Asp Trp Asp Leu 
625 630 635 640 

Leu Glu Val Val Arg Gly Gly Ala Gly Glu Gly Val Trp Gly Arg Val 
645 • 650 655 

Asp Val Val Gin Pro Val Ser Trp Ala Val Met Val Ser Leu Ala Arg 
660 665 670 

Leu Trp Met Ser Val Gly Val Val Pro Asp Ala Val Val Gly His Ser 
675 680 685 

Gin Gly Glu Val Ala Ala Ala Val Val Gly Gly Val Leu Ser Val Ala 
690 695 700 

Asp Gly Ala Arg Val Val Ala Leu Arg Ser Arg Val lie Gly Glu Val 
705 710 715 720 

Leu Ala Gly Gly Gly Ala Met Val Ser Val Gly Leu Pro He Val Asp 
725 730 735 

Val Gin Glu Arg Leu Ala Gly Trp Gly Gly Arg Leu Gly Val Ala Ala 
740 745 750 

Val Asn Gly Pro Ser Leu Thr Val Val Ser Gly Asp Val Asp Ala Ala 
755 760 765 

Val Gly Phe Val Gly Glu Cys Glu Arg Asp Gly Val Trp Val Arg Arg 
770 775 780 

Val Ala Val Asp Tyr Ala Ser His Ser Ala His Val Glu Ala Val Glu 
785 790 795 800 

Gly Met Leu Ser Gly Leu Leu Gly Gly Leu Cys Pro Gly Arg Gly Val 
805 810 815 

Val Pro Phe Tyr Ser Ser- Val Val Gly Gly Val Val Asp Gly Val Gly 
820 825 830 ' 

Leu Asp Gly Gly Tyr Trp Tyr Arg Asn Leu Arg Glu Arg Val Leu Phe 
835 840 845 

Ser Asp Val Val Gly Arg Leu Val Gly Asp Gly Phe Ser Gly Phe Val 
850 855 860 

Glu Cys Ser Gly His Pro Val Leu Ala Gly Gly Val Leu Glu Ser Val 
865 870 875 880 

Ala Val Val Asp Pro Asp Val Arg Pro Val Val Val Gly Ser Leu Arg 
885 890 895 



Arg Asp Asp Gly Gly Trp Gly Arg Phe Leu Thr Ser Val Gly Glu Ala 
900 905 910 
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Phe Val Gly Gly Met Ser Val Asp Trp Lys Gly Val Phe Ala Gly Ala 
915 920 925 

Gly Ala Arg Leu Val Asp Leu Pro Thr Tyr Pro Phe Gin Arg Arq His 
930. 935 940 

Tyr Trp Ala Pro Thr Pro Thr Asn Pro Ala Thr Asn Pro Ala Thr Asn 

950 955 

Pro Ala Thr Asn Pro Ala Thr Gly Asp Thr Thr Thr Ala Asp Pro Ala 
965 970 975 

Gly Asp Leu Arg Tyr Arg He Thr Trp Lys Pro Leu Pro Thr Asp Asp 

980 985 ggo 

Pro Arg Pro Leu Thr Asn Arg Trp Leu Leu Met Val Pro Glu Ala Leu 

1000 1005 

Ala Gly Asp Gly Val Val Ala Gly Val Arg Gin Ala Leu Ala Ala 
1010 1015 1020 

Arg Gly Ala Ser Val Glu Leu Leu Thr Val Gly Thr Ala Asp Arq 
1025 1030 1035 

• Ala Gly Leu Ala Ala Leu Leu Thr Ser Ala Ala Pro Gly Asp Pro 
1040 1045 1050 

Glu Ala Ala Gly Pro Ala Gly Val Val Ser Leu Leu Ala Leu Ala 
1055 1060 1065 

Glu Gly Ala Asp Ala Arg His Pro Ala Val Pro Leu Gly Leu Thr 
1070 1075 1080 

Ala Ser Leu Ala Leu He Gin Ala Leu Ala Asp Ala Gly Thr Gin 
1085 1090 1095 

Ala Arg Leu Trp Ala Val Thr Arg Gly Ala Val Ala Val Ser Ser 
1100 1105 1110 

Gly Glu Val Pro Asp Ala Gly Gin Ala Gin Val Trp Gly Leu Gly 
1115 1120 1125 

Arg Val Ala Ala Leu Glu Leu Pro Asp Arg Trp Gly Gly Leu Val 
1130 1135 1140 

Asp Leu Pro Ala Leu Thr Gly Glu Arg Ala Phe Ala Gin Leu Ala 
1145 1150 1155 

Asp Val Val Gly Gly Ser Asn Gly Glu Asp Gin Val Ala Val Arg 
1160 1165 1170 

Ala Ser Gly Val Tyr Gly Arg Arg Leu Val Arg Ser Arg Ala Thr 
1175 1180 1185 

Val Thr Ser Gly Asp Trp Pro Ala Arg Gly Thr He Leu Val Val 
1190 1195 1200 
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Gly Asp 
1205 



Thr Gly Pro Val Ala 
1210 



Ala Leu Leu Ala 



Gly 
1215 



Arg Leu Leu 



Gly Asp Gly Ala Ala His Val Val Leu Ala Gly Pro Ala Ala Ala 
1220 1225 1230 

Ser Thr Val Gly Leu Thr Gly Gly Ala Asp Arg Val Ala Leu lie 
1235 1240 1245 

Asp Cys Asp Pro Ser Asp Arg Asp Ala Leu Ala Gly Leu Leu Gly 
• 1250 1255 1260 

Ala Tyr Arg Pro Thr Thr He Val Val Ala Pro Pro Ala Val Ala 
1265 1270 1275 

Leu Thr Ala Leu Ala Glu Thr Thr Pro Glu Asp Phe Val Ala Ala 
1280 1285 1290 

Val Ala • Ala Lys ^r Thr Thr Ala Val His Leu Asp Ala Leu Ala 
1295 1300 1305 

Ala Glu Ala Glu Leu Glu Leu Asp Ala Phe Val Val Phe Ser Ser 
1310 1315 1320 

Val Ser Gly Thr Trp Gly Gly Ala Gly His Gly Gly Tyr Ala Ala 
1325 1330 1335 

Gly Thr Ala Arg Leu Asp Ala Leu Val Glu Glu Arg Arg Ala Arg 
1340 1345 1350 

Gly Leu Pro Ala Thr Ala He Ala Trp Thr Pro Trp Ala Asp Ala 
1355 1360 1365 

Thr Thr Ala Ala Gly Gly Gin Ala Pro Asp Ala Ser Ala Gly Gly 
1370 1375 1380 

His Glu Pro Asp Thr Arg Ala Gly Gly Pro Asp Arg Glu Leu Leu 
1385 1390 1395 

Arg Arg Gly Gly Leu Thr Pro Leu Asp Pro Gly Ala Ala Leu Asp 
1400 1405 1410 

Val Leu Arg Gly Ala Val Ala Arg Gly Glu Gly Leu Val Thr Val 
1415 1420 1425 

Ala Asp Val Asp Trp Ala Arg Phe Val Ala Ser Tyr Thr Ala Ala 
1430 1435 1440 

Arg Pro Thr Thr Leu Phe Asp Glu Leu Pro Glu Leu Arg Ala Thr 
.1445 1450 1455 

Arg Glu Ala Glu His Thr Pro Ala Glu Asp Ser Ser Ala Gly Gly 
1460 1465 1470 

Glu Leu Val Arg Ala Leu Ser Gly Arg Pro Ala Ala Asp Gin His 
1475 1480 1485 
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Arg Thr Leu Leu Arg Leu Val Arg Ala His Val Ala Ala Val Leu 
1490 1495 1500 

Gly His Asp Glu Ala Glu Ala Ala Asp Pro Asp Arg Ala Phe Arg 
1505 1510 1515 

Glu Leu Gly Phe Thr Ser Val Thr Ala Val Asp Leu Arg Asn Arg 
1520 1525 1530 

Leu Asn Ala Ala Thr Gly Leu Asn Leu Pro Ala Ser Val Val Phe 
1535 1540 1545 

Asp His Pro Ser Ala Arg Val Leu Ala Ala Tyr Leu Arg Ala Glu 
1550 1555 1560 

Leu Leu Gly Pro Glu Ala Asp Glu Asp Thr Ala Glu Ala Val Ala 
1565 1570 1575 

Pro Pro Ser Ala Pro Ala Gly Ala Gly Asp Asp Glu Pro He Ala 
1580 1585 1590 

Val He Gly Met Ala Cys Arg Phe Pro Gly Gly Val Asp Ala Pro 
1595 1600 1605 

Asp Asp Leu Trp Asp Leu Leu Ala Lys Gly Arg Asp Ala He Ser 
1610 1615 1620 

Arg Phe Pro Thr Asn Arg Gly Trp Asp Val Asp Gly Leu Tyr Asp 
1625 1630 1635 

Pro Asp Pro Glu Ala Pro Gly Arg Thr Tyr Val Arg Glu Gly Gly 
1640 1645 ' 1650 

Phe Leu His Asp Ala Pro Asp Phe Asp Ala Ala Phe Phe Gly He 
1655 1660 1665 

Ser Pro Arg Glu Ala Leu Ala Met Asp Pro Gin Gin Arg Leu Leu 
1670 1675 1680 

Leu Glu Thr Thr Trp Glu Ser Leu Glu Arg Ala Gly Leu Asp Pro 
1685 1690 1695 

Thr Ala Leu Arg Gly Thr Arg Thr Gly Val Phe Val Gly Thr Asn 
1700 1705 1710 

Gly Gin His Tyr Met Pro Leu Leu Arg Asp Gly Ala Asp Asp Phe 
1715 1720 1725 

Asp Gly Tyr Leu Gly Thr Gly Asn Ser Ala Ser Val Met Ser Gly 
1730 1735 1740 

Arg Leu Ser Tyr Val Phe Gly Leu Glu Gly Pro Ala Val Thr Val 
1745 1750 1755 

Asp Thr Ala Cys Ser Ala Ser Leu Val Ala Leu His Leu Ala Val 
1760 1765 1770 
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Gin Ala Leu Arg Arg Gly Glu Cys Thr Leu Ala Leu Val Gly Gly 
1775 1780 1785 

Ala Thr Val Met Ser Thr Pro Asp Met Leu Val Glu Phe Ser Arg 
1790 1795 1800 

Gin Arg Ala Met Ser Pro Asp Gly Arg Ser Lys Ala Phe Ala Ala 
1805 1810 1815 

Ala Ala Asp Gly Val Ala Leu Ser Glu Gly Ala Ala Met Met Val 
1820 1825 1830 

Val Gin Arg Leu Ala Asp Ala Glu Ala Ala Gly His Glu lie Leu 
1835 1840 1845 

Ala Val Val Lys Gly Ser Ala Val Asn Gin Asp Gly Ala Ser Asn 
1850 1855 1860 

Gly Leu Thr Ala Pro Asn Gly Pro Ser Gin Glu Arg Val lie Arg 
1865 1870 1875 

Gin Ala Leu Ala Asp Ala Gly Leu Arg Pro Asp Gin Val Asp Ala 
1880 1885 1890 

Val Glu Ala His Gly Thr Gly Thr Ala Leu Gly Asp Pro lie Glu 
1895 1900 1905 

Ala Gin Ala Leu Leu Ala Thr Tyr Gly Arg Asp Arg Pro Ala Gly 
1910 1915 1920 

Arg Pro Leu Trp Leu Gly Ser Leu Lys Ser Asn lie Gly His Thr 
1925 1930 1935 

Gin Ala Ala Ala Gly lie Ala Gly Val Met Lys Val He Leu Ala 
1940 1945 1950 

Leu Arg His Asp Thr Leu Pro T^g Thr Leu His Val Asp Arg Pro 
1955 1960 1965 

Thr Pro Arg Val Asp Trp Ala Ser Gly Ala Val Ser Leu Leu Thr 
1970 1975 • 1980 

Glu Pro Val Pro Trp Pro Gin Gly Asp Glu Pro Arg Arg Ala Ala 

1985 1990 1995 

Val Ser Ser Phe Gly He Ser Gly Thr Asn Ala His Val He Val 
2000 2005 2010 

Glu Gin Ala Pro Pro Val Val Arg Glu Pro He Asp His Glu Ala 
2015 2020 2025 

Asp Glu Val Thr Val Pro Leu Phe Leu Ser Ala Arg Gly Ser Ala 
2030 2035 2040 

Ala Leu Cys Ala Gin Ala Ala Arg Leu Arg Ala Arg Leu He Glu 
2045 2050 2055 
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Glu.Pro Asp Leu Asp He Ala Glu Val Gly Tyr Thr Leu Ala Ala 
2060 2065 2070 

Thr Arg .Ala Arg Phe Glu His Arg Ala Val Val He Gly Glu Ser 
2075 2080 2085 

Arg Ala Glu Val Gly Asp Ala Leu Ala Ala Leu Ala Arg Gly Glu 
2090 2095 2100 

Glu His Pro Ser Leu Leu Arg Gly Arg Ala Gly Ala Ser Asp Arg 
■ 2105 2110 2115 

Val Ala Phe Val Phe Pro Gly Gin Gly Ser Gin Trp Ala Glu Met 
2120 .2125 2130 

Ala Asp Gly Leu Leu Asp Arg Ser Pro Ala Phe Arg Ala Ser Ala 
2135 * 2140 2145 

Ser Ala Cys Asp Glu Ala Leu Arg Ala His Leu Asp Trp Ser Val 
2150 2155 2160 

Leu Asp Val Leu Arg Arg Val Pro Asp Ala Pro Ala Leu Ser Arg 
2165 2170 2175 

Val Asp Val Val Gin Pro Val Leu Phe Thr Met -Met Val Ser Leu 
2180 2185 2190 

Ala Ala Ala Trp Arg Ala Leu Gly Val His Pro Ser Ala Val Val 
2195 2200 2205 

Gly His Ser Gin Gly Glu He Ala Ala Ala His Val Ala Gly Gly 
2210 2215 2220 

Leu Ser Leu Asp Asp Ala Ala Arg He Val Ala Leu Arg Ser Gin 
2225 2230 2235 

Ala Tzp Leu Arg Leu Ala Gly Gin Gly Gly Met Val Ala Val .Ser 
2240 2245 2250 

Leu Pro Val Asp Ala Leu Arg Ala Arg Leu Ala Arg Phe Gly Asp 
2255 2260 2265 

Arg Leu Ser Val Ala Ala Val Asn Ser Pro Gly Thr Ala Ala Val 
2270 2275 2280 

Ser Gly Tyr Pro Asp Ala Leu Ala Glu Leu Val Asp Glu Leu Thr 
2285 2290 2295 

Ala Glu Gly Val His Ala Lys Ala He Pro Gly Val Asp Thr Ala 
2300 2305 2310 

Gly His Ser Ala Gin Val Glu Val Leu Lys Asp His Leu Met Ala 
2315 2320 2325 

Ala Leu Ala Pro Val Ser Pro Arg Ser Ser Gin He Pro Phe Tyr 
2330 2335 2340 
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Ser Thr Val Thr Gly Gly Leu Leu Asp Thr Ala Leu Leu Asp Ala 
2345 2350 2355 

Ala Tyr Trp Tyr Arg Asn Met Arg Asp Pro Val Glu Phe Glu Gin 
2360 2365 2370 

Ala Thr Arg Ala Met Leu Ala Asp Gly His Glu Gly Phe Leu Glu 
2375 2380 2385 

Pro Ser Pro His Pro Met Leu Ser Val Ser Leu Gin Gly Thr Ala 
2390 2395 2400 

Ala Asp Ala Gly Val Ala Ala Thr Val Leu Gly Thr Leu Arg Arg 
2405 2410 2415 

Gly Lys Gly Gly Ala Arg Trp Phe Gly Met Ala Leu Gly Leu Ala 
2420 2425 2430 

His Ala His Gly He Glu He Asp Ala Ser Val Leu Phe Gly Thr 
2435 2440 2445 

Asp Ser Arg Arg Val Asp Leu Pro Thr Tyr Pro Phe Gin Arg Glu 
2450 2455 2460 

Arg Phe Trp Tyr His Pro Pro Ala Ala Arg Gly Asp Val Ala Ser 
2465 2470 2475 

Ala Gly Leu Ser Gly Ala Asp His Pro Leu Leu Gly Gly Ala Val 
2480 2485 2490 

Glu Leu Pro Asp Arg Gly Gly His Val Tyr Pro Ala Arg Leu Gly 
2495 2500 2505 

Val Arg His His Pro Trp Leu Gly Glu His Ala Leu Leu Gly Ala 
2510 2515 2520 

Ala He Leu Pro Gly Ala Ala Tyr Ala Glu Leu Ala Leu Trp. Ala 
2525 2530 2535 

Giy Arg Arg Asp Gly Ala Gly Arg He Glu Glu Leu Thr Leu Asp 
2540 2545 2550 

Ala Pro Leu Val Val Ala Asp Glu Ser Ala Ala Gin Leu Arg Leu 
2555 2560 2565 

Val Val Gly Pro Ala Asp Ala Glu Gly Arg Arg Gin Leu Thr Val 
2570 2575 2580 

His Ser Arg Ala Asp Gly Ala Asp Ala Asp Thr Ala Trp Thr Arg 
2585 2590 2595 

His Ala Gin Gly Thr Leu Val Pro Ala Asp Ala Asp Ala Ala Gly 
2600 2605 2610 

Ser Gly Asp Pro Gly Ala Pro Trp Pro Pro Ala Gly Ala Glu Pro 
2615 2620 2625 



- 88 - 



wo 03/010193 



Val Glu Val Ala Gly Leu Tyr Asp Arg Phe Ala Asp Arg Gly Tyr 
2630 2635 2640 

Gin Tyr Gly Pro Ser Phe Arg Gly Val Arg Ala Ala Trp Arg Ala 
2645 2650 \. 2655 

Gly Asp Thr Val Tyr Ala Glu Val Ala Leu Pro Val Pro Gin Pro 
2660 2665 2670 

Gly Ser Pro Arg Phe Gly Val His Pro Ala Leu Leu Asp Ala Ala 
2675 2680 2685 

Phe Gin Ala Met Ser Leu Gly Ala Phe Phe Pro Glu Asp Gly Gin 
2690 2695 2700 

Val Arg Met Pro Phe Ala Leu Arg Gly Val Ser Ser Ser Gly Val 
2705 2710 2715 

Gly Ala Asp Arg Leu Arg Val Thr He Ser Pro Ala Gly Ala Glu 
2720 2725 2730 

Ala Val Arg He Ala Cys Val Asp Glu Arg Gly Asn Pro Val Val 
2735 2740 2745 

Val He Asp Ser Leu Val Ala Arg Ala Val Pro Val Glu Ala Leu 
2750 2755 2760 

Thr Pro Gly Thr Pro Gly Thr Gly Asp Gly Ala Leu His His Val 
2765 2770 2775 

Ala Trp Thr Ala Arg Pro Glu Pro Gly Val Ala Ala Val Gin Arg 
2780 2785 2790 

Trp Ala Val Val Gly Ala Ala Asp Pro Gly Leu Ala Gly Gly Leu 
2795 2800 2805 

Asp Arg Ala Gly Gly Leu Cys Gly Ala Tyr Pro Asp Leu Ala Gly 
2810 2815 2820 

Leu Val Ala Ala Val Ala Glu Gly Ala Ala Leu Pro Asp Val Val 
2825 2830 2835 

Ala Val Pro Val Pro Ser Gly Ala Pro Val Gly Pro Asp Ala Val 
2840 2845 2850 

Arg Ala Thr Val Leu Gly Ala Leu Asp Leu He Arg Ala Trp Leu 
2855 2860 2865 

Ala Val Glu Gly Arg Leu Gly Leu Ala Arg Leu Ala Phe Val Thr 
2870 2875 2880 

Thr Ser Ala Val Ala Val Gly Asp Gly Thr Glu His Val Asp Pro 
2885 2890 2895 

Val Ser Ala Ala Leu Trp Gly Leu Val Arg Ser Ala Gin Ser Glu 
2900 2905 2910 
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Glu 



Pro 
2915 



Gly Arg Phe Val 



Leu 
2920 



Val Asp Leu Asp 



Ala 
2925 



Asp Pro Ala 



Ser Ala Ser Ala Leu Pro Ala Ala* Leu Ala Ala Gly Glu Pro Gin 
2930 2935 2940 

Leu Ala Val Arg Ala Gly Ala Val His Val Pro Arg Leu Val Arg 
2945 2950 2955 

His Arg Pro Arg Pro Asp Gly Pro Leu Thr Pro Pro Ala Gly Ala 
2960 2965 2970 

Ala Trp Arg Leu Ala Ala Gly Gly Gin Gly Thr Leu Glu Gly Leu 
2975 2980 2985 

Ala Leu Val Pro Ala Pro Asp Ala Leu Ala Pro Leu Ala Pro Gly 
2990 2995 3000 

Gin Val Arg Val Ala Val Arg Ala* Ala Gly Val Asn Phe Arg Asp 
3005 3010 3015 

Thr Leu lie Ala Leu Gly Met Tyr Pro Gly Thr Pro Val Leu Gly 
3020 3025 3030 

Ala Glu Gly Ala Gly Val lie Thr Glu Val Ala Pro Asp Val Ala 

3035 3040 3045 

Gly Phe Ala Pro Gly Asp Arg Val Leu Gly Met Trp Thr Gly Gly 
3050 3055 3060 

Leu Gly Pro Val Ala Val Ala Asp Ala Arg Met Leu Ala Arg Val 
3065 3070 3075 

Pro Arg Gly Trp Ser Tyr Ala Glu Ala Ala Ser Val Pro Ala Val 
3080 3085 3090 

Phe Leu Thr Ala His Tyr Ala Leu Thr Arg Leu Ala Gly He Arg 
3095 3100 3105 

Pro Gly Gin Ser Leu Leu Val His Ala Gly Ala Gly Gly Val Gly 
3110 3115 3120 

Met Ala Thr Leu Gin Leu Ala Arg His Leu Gly Val Glu Val Tyr 
3125 3130 3135 

Ala Thr Ala Ser Arg Gly Lys Trp Asp Thr Leu Arg Gly Leu Gly 
3140 3145 3150 

Leu Asp Asp Ala His He Ala Asp Ser Arg Ser Leu Asp Phe Ala 
3155 3160 3165 

Gly Arg Phe Leu Ala Ala Thr Gly Gly Arg Gly Val Asp Val Val 
3170 3175 3180 

Leu Asn Ser Leu Ala Gly Asp Phe Val Asp Ala Ser Leu Arg Leu 
3185 3190 3195 
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Leu Pro Arg Gly Gly His Phe Leu Glu Leu Gly Lys Ala Asp Val 
3200 3205 3210 

Arg Asp Pro Asp Arg He Ala Ala Asp His Pro Gly Val Gly Tyr 
3215 3220 3225 

Arg Ala Phe Asp Leu Val Glu Ala Gly Pro Glu Leu Val Gly Gin 
3230 3235 3240 

Leu Leu Gly Glu Leu Met Glu Leu Phe Ala Ala Gly Val Leu Ser 
3245 3250 3255 

Pro Leu Pro Leu Thr Val Arg Asp Val Arg Arg Ala Arg Glu Ala 
3260 3265 3270 

Phe Arg Leu He Ser Gin Ala Arg His Val Gly Lys Val Val Leu 
3275 3280 3285 
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His Leu Asp Gin Glu Gly Met Arg Arg Arg Met Ala Arg Gly Gly 
3485 3490 3495 

Val Leu Pro Leu Thr Thr Asp Gin Gly Leu Ala Leu Phe Asp Ala 
3500 3505 3510 

Ala Gin Leu Val Asp Glu Ala Leu Gin Val Pro He Arg Leu Asn 
3515 3520 3525 

Val Gly Ala Leu Arg Ala Ala Gly Arg Val Pro Ala Leu Leu Ala 
3530 3535 3540 

Asp Leu Val Pro Ala Ala Ala Ser Gly Ala Pro Ala Ala Thr Pro 
3545 3550 3555 

Thr Arg Asp Asp Ala Asp Arg Thr Leu Ala Asp Arg Leu Ala Gly 
3560 3565 3570 

Leu Thr Val Ala Glu Gin Arg Glu Leu Val Leu Glu Ser Val Arg 
3575 3580 3585 

Gly His Ala Ala Ala Val Leu Gly His Ala Asp Pro Gin Ala Val 
3590 3595 3600 

Asp Ala Asp Arg Ala Phe Arg Glu Leu Gly Phe Asp Ser Leu Thr 
3605 3610 3615 

Ala Val Glu Leu Arg Asn Arg Leu Ala Thr Ala Ser Gly Leu Arg 
3620 3625 3630 

Leu Pro Ala Thr Leu Val Phe Asp His Pro Thr Pro Glu Ala Leu 
3635 3640 3645 

Ala Glu His Leu Leu Ala Gly Leu Ala Pro Glu Gin Ala Arg Ala 
3650 3655 3660 

Glu Leu Pro Leu Leu Ala Glu Leu Gly Arg Leu Glu Ala Ala Leu 
3665 3670 3675 

Ala Ala Thr Asp Gly Ala Ala Leu Asp Gly Leu Asp Asp Leu Val 
3680 3685 3690 

Arg Arg Glu Val Gly Val Arg He Ala Ala Leu Ala Ala Arg Trp 
3695 3700 3705 

Gly Ala Ala Gly Asp Asp Val Ala Gly Ser Asp Gly Gly Gly Thr 
3710 3715 3720 

Ala Asp Ala Leu Glu Ser Ala Asp Asp Asp Glu He Phe Ala Phe 
3725 • 3730 3735 

He Asp Glu Arg Phe Arg Ala 
3740 3745 

<210> 15 
<211> 11238 
<212> DNA 
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<213> micromonospora carbonacea subspecies aurantiaca 
<400> 15 

gtgtctgtca acaacgaaga caagcttcgc gagtatctgc gtcgtgccat ggcggatctc 60 

catgagtccc gcgagcggtt gcggcagtac gagtccgctg ctgctgtgga tgatccggtg 120 

gtggtggtgg ggatgggttg tcgttttccg ggtggggtgg tgtgtgcgga gggtttgtgg 180 

gatttggtgt tggggggtgg ggatgcggtg tcggggtttc cggtggatcg gggttgggat 240 

gtggaggggt tgtttgatcc ggtgcggggt gtggtgggga agtcgtatgt gcgggagggg 300 

gggtttgtgt atgacgcggg gatgttcgat gcggagtttt ttggtgtgtc gccgcgtgag 360 

gcggtggcga tggatccgca gcagcgtttg tttttggagg tgtcgtggga ggcgttggag 420 

cgtgcgggga ttgatccgtt gggtttgcgg ggttcgcgga cgggtgtgta tgtgggggtg 480 

atgggtcagg agtatgggcc gcggttggtg gagtcgggtg gtgggtttga gggttatttg 540 

ttgacgggga cgtcgccgag tgtggtgtcg ggtcgtgttt cgtatgtgtt ggggttggag 600 

ggtccgtcga tttcggttga tacggcgtgt tcgtcgtcgt tggtggcgtt gcatttggcg 660 

tgtcaggggt tgcggttggg . tgagtgtgat gtggcgttgg cgggtggggt gacggtgatt 720 

gcggcgccgg ggttgtttgt ggagttttct cggcagggtg ggttgtcggg tgatgggcgg 780 

tgtcgggcgt ttgcgggtgg tgcggatggg acggggtggg gggagggtgc gggggtggtg 840 

gtgttggagc ggttgtcggt ggcgcgggag cgtggtcatc gggtgttggc ggtggtgcgg 900 

ggttctgcgg tgaatcagga tggtgggtcg aatggtttga cggcgccgtc gggggtggcg 960 

cagcgtcggg tgattggtgc ggcgttggtg gcggcgggtt tgggtgtgtc ggatgtggat 1020 

gtggtggagg cgcatgggac ggggactcgg ttgggtgatc cgattgaggc tgaggcgttg 1080 

ttggggtcgt atgggcgggg tcgtgtgggt ggggcgttgt tgttgggttc ggtgaagtcg 1140 

aatattggtc atacgcaggc ggctgcgggt gtggcgggtg tgatcaagat ggtgatggcg 1200 

ttgcgggcgg gggtggtgcc ggcgacgttg catgtggatg tgccgtcgcc gttggtggat 1260 

tggtcttcgg gtggggtgga gttggtgacg gaggcgcggg attggccggt ggtgggtcgt 1320 

gtgcgtcgtg cgggtgtgtc ggcgtttggg gtgtcgggga cgaatgcgca tctgattttg ' 1380 

gagcaggccc ccgagttcga cgatcctgcc gattccgatt ccgattccga ttccgatgcc 1440 

ggtgtcgtgg atggcggcga gggtggtgtt ggcaggagct tgtcggtggt tccggtggtg 1500 

gtgtcgggtc gttcggtggg ggctttgcgg gcgtatgcgg gtcggttgcg tgaggtgtgc 1560 

gcggggttgt ctgacggtgg tggctccggt ggtggttctg gtttggtgga tgtgggttgg 1620 * 



- 93 - 



wo 03/010193 



PCT/CA02/01177 



tcgttggtgt 


cgtcgcggtc 


ggtgtttgag 


catcgggcgg 


tcgtgttcgg 


tgggggtgtg 


1680 


gaggaggttg 


ttgctggtct 


tggtgcggtg 


gcttctgggg 


cggtggcttc 


gggttcggtg 


1740 


gtggtgggtt 


cggtggcgtc 


gggtgttgct 


ggtggtggtg 


gtcgggtggt 


gtttgtgttt 


1800 


ccgggtcagg 


gttggcagtg 


ggtgggtatg 


ggtgcggcgc 


tgctggacga 


gtcggaggtg 


1860 


ttcgccgagt 


cgatggtgga 


gtgtggtcgg 


gcgttgtcgg 


ggtttgtgga 


ttgggatttg 


1920 


ttggaggtgg 


tgcgcggcgg 


ggcgggtgag 


ggggtgtggg 


gtcgggttga 


tgtggtgcag 


1980 


ccggtgtcgt 


gggcggtgat 


ggtgtcgttg 


gcgcggtfcgt 


ggatgtcggt 


gggtgtggtg 


2040 


ccggatgcgg 


tggtgggtca 


ttcgcagggt 


gaggttgctg 


cggcggtggt 


ggggggtgtg 


2100 


ttgagtgtgg 


ctgatggggc 


gcgggtggtg 


gcgttgcggt 


cgcgggtgat 


cggtgaggtg 


2160 


ttggccggtg 


gtggtgcgat 


ggtgtcggtc 


ggactgccga 


tcgtggatgt 


gcaggaacgg 


2220 


ttggcggggt 


ggggtggtcg 


gttgggtgtg 


gcggcggtga 


atggtccgtc 


gttgacggtg 


2280 


gtgtcggggg 


atgtggatgc 


tgctgtgggg 


tttgttggtg 


agtgtgagcg 


ggatggggtg 


2340 


tgggtgcggc 


gggtggcggt 


ggattatgcg 


tcgcattcgg 


cgcatgtgga 


ggcggtggag 


2400 


gggatgctgt 


cggggttgtt 


gggtggtttg 


tgtccggggc 


ggggtgtggt 


gccgttttat 


2460 


tcgtcggtgg 


tgggtggtgt 


ggttgatggg 


gtgggtttgg 


atggtgggta 


ttggtatcgg 


2520 


aatctgcgtg 


agcgggtgtt 


gttttcggat 


gtggtggggc 


ggcttgttgg 


ggatgggttt 


2580 


tcggggtttg 


tggagtgttc 


ggggcatccg 


gtgttggcgg 


gtggggtgtt 


ggagtcggtg 


2640 


gcggtggtgg 


atccggatgt 


gcggccggtg 


gtggtggggt 


cgctgcgccg 


tgatgatggt 


2700 


gggtggggcc 


ggtttctgac 


gtcggtgggt 


gaggcgttcg 


tcggcgggat 


gagtgttgac 


2760 


tggaagggtg 


tgttcgcggg 


ggcgggcgcg 


cggttggttg 


acctgccgac 


gtatccgttc 


2820 


caacgccgcc 


actactgggc 


accgactccc 


accaaccccg 


ccaccaaccc 


cgccaccaac 


2880 


cccgccacca 


accccgccac 


gggcgacacc 


accaccgccg 


acccggcggg 


tgacctgcgg 


2940 


tatcggatca 


cctggaaacc 


gttgccgacc 


gacgaccccc 


gacccctcac 


caaccgctgg 


3000 


ctgctgatgg 


tgcccgaggc 


gctggccggt 


gacggggtgg 


tggcgggcgt 


acggcaggcg 


3060 


ctggccgcgc 


gtggcgcctc 


cgtcgaactg 


ctgaccgtcg 


gcaccgccga 


ccgggccggc 


3120 


cttgccgcgc 


tcctgacctc 


cgccgccccc 


ggcgacccgg 


aggcggccgg 


cccggcgggc 


3180 


gtggtctccc 


tgctggcgct 


cgccgagggc 


gcggacgcgc 


gccacccggc 


cgtaccgctc 


3240 


ggcctgaccg 


cctcgctcgc 


cctgatccag 


gcattggcgg 


acgcggggac 


gcaggcccgc 


3300 


ctctgggcgg 


tcacccgggg 


ggccgtcgcc 


gtgtcctccg 


gcgaggtgcc 


ggacgccggg 


3360 
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caggcccagg tgtgggggct cggccgggtc gcggccctcg aactgccgga ccgatggggc 3420 

gggctggtgg acctgccggc gctcaccggg gagcgtgcct tcgcgcagct cgccgatgtc 3480 

gtgggcggct cgaacggcga ggaccaggtc gccgtacggg cctccggcgt ctacggtcga 3540 

cgcctcgtgc gttcccgcgc caccgtcacg tccggcgact ggccggcccg gggcaccatc 3600 

ctcgtcgtcg gggacaccgg cccggtcgcc gcgctcctgg ccggccgcct cctcggcgac 3660 

ggggcggcgc acgtggtgct cgccggcccg gccgccgcgt ccaccgtcgg gctcaccggc 3720 

ggggccgacc gggtggccct gatcgactgc gacccgagcg accgggacgc gctcgccggg 3780 

ctgctcggcg cgtaccggcc cacgacgatc gtggtggctc cgcccgccgt cgcgctcacc 3 840 

gccctcgccg agaccacgcc ggaggacttc gtcgccgccg tcgccgcgaa gacgacgacg 3900 

gcagtgcacc tcgacgccct tgcggcggag gcggaactgg agctcgacgc gttcgtcgtc * 3960 

ttctcctcgg tctccggcac ctggggcggc gcggggcacg gcggctacgc ggcgggcacc 4020 

gcccggctgg acgcgctggt cgaggagagg cgggcccgtg gcctgcccgc cacggcgatc 4080 

gcgtggacgc cgtgggccga cgcgaccaca gccgccggcg ggcaggcacc cgatgccagc 4140 

gccggcgggc acgaacccga cacgagggcc gggggccccg accgcgaact gctgcgccgg 4200 

ggtggcctca ccccgttgga cccgggggcc gcgctggacg tgctgcgcgg ggcggtggcg 4260 

cggggcgagg gcctggtgac cgtggccgac gtcgactggg cgcggttcgt cgcctcgtac 4320 

accgcggccc ggcccaccac gctcttcgac gaactgcccg agctgcgggc gacccgggag 4380 

gcggagcaca ccccggccga ggactcgtcg gccggcggcg aactggtccg tgccctcagc 4440 

ggccggcccg cggccgatca gcaccggacg ctgctgcggc tggtccgtgc gcacgtcgcg 4500 

gccgtcctgg ggcacgacga ggccgaggcg gccgatccgg accgggcgtt ccgggaactc 4560 

ggcttcacct cggtgacggc ggtggacctg cggaaccggc tgaacgcggc caccgggctg 4620 

aacctgccgg cgtccgtcgt cttcgaccat cccagcgccc gggtgctggc cgcgtacctg ' 4680 

cgtgccgagc tgctcgggcc gg^ggccgac gaggacacgg cggaggccgt cgccccgccg 4740 

tccgcgccgg ccggggcggg cgacgacgag ccgatcgcgg tgatcgggat ggcctgtcgg 4800 

ttcccgggcg gggtcgacgc ccccgacgac ctgtgggatc tgctggcgaa gggccgcgac 4860 

gccatctcca ggttccccac gaaccggggc tgggacgtcg acggcctgta cgacccggac 4920 

ccggaggcgc ccggccgcac ctacgtccgc gagggcggct tcctgcacga cgcgcccgac 4980 

ttcgatgccg cgttcttcgg gatctcgccc cgcgaggccc tcgccatgga tccgcagcag 5040 
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cgcctgctgc 


tggagaccac 


gtgggagtcc 


ctggaacggg 


ccgggttgga 


cccgaccgcg 


5100 


ttgcgcggca 


cccggaccgg 


ggtgttcgtg 


gggaccaacg 


gccagcacta 


catgccgctg 


5160 


ctgcgagacg 


gcgcggacga 


cttcgacggc 


tacctcggca 


ccggcaactc 


ggccagcgtc 


5220 


atgtccggcc 


ggctctccta 


cgtcttcggc 


ctggagggcc 


cggcggtgac 


cgtggacacg 


5280 


gcctgctccg 


cctccctcgt 


ggcgctgcac 


ctcgcggtgc 


aggcgctgcg 


ccggggcgag 


5340 


tgcacgctgg 


ccctggtcgg 


cggggccacg 


gtgatgtcga 


cgccggacat 


gctggtggag 


5400 


ttctcccggc 


agcgggcgat 


gtcgccggac 


ggccggtcga 


aggcgttcgc 


cgccgccgcc 


5460 


gacggggtgg 


cgctcagcga 


gggcgccgcc 


atgatggtgg 


tgcagcggct 


cgccgacgcg 


5520 


gaggccgccg 


ggcacgagat 


cctggccgtg 


gtcaagggct 


cggccgtcaa 


ccaggacggg 


5580 


gccagcaacg 


gcctcaccgc 


cccgaacggg 


ccctcccagg 


aacgggtcat 


ccggcaggcg 


5640 


ctggccgacg 


ccggcctgcg 


gccggaccag 


gtggacgcgg 


tcgaggcgca 


cggcaccggc 


5700 


accgccctgg 


gcgaccccat 


cgaggcgcag 


gcgctgctcg 


ccacgtacgg 


ccgggaccgg 


5760 


ccggcgggcc 


ggccactgtg 


gctcggctcg 


ctgaagtcca 


acatcggtca 


cacccaggcc 


5820 


gccgccggca 


tcgccggggt 


gatgaaggtg 


atcctggcgc 


tgcggcacga 


cacgctgccg 


5880 


cgcacgctgc 


acgtggaccg 


gccgacgccc 


cgggtggact 


gggcttccgg 


ggcggtgtcg 


5940 


ttgctgaccg 


agccggtgcc 


gtggccgcag 


ggcgacgaac 


cccgccgggc 


ggcggtgtcc 


6000 


tcgttcggga 


tcagcggcac 


caacgcccac 


gtgatcgtcg 


agcaggcgcc 


gccggtggtg 


6060 


cgggaaccga 


tcgaccacga 


ggcggacgag 


gtcaccgtcc 


cgctgttcct 


gtcggcccgg 


6120 


gggagcgccg 


cgctctgcgc 


ccaggcggca 


cggctgcggg 


cccggttgat 


cgaggaaccc 


6180 


gacctggaca 


tcgccgaggt 


cggctacacg 


ctggcggcca 


cccgggcccg 


cttcgagcac 


6240 


cgggccgtgg 


tgatcgggga 


gagccgcgcg 


gaggtcggcg 


acgcgctcgc 


cgcgctggcc 


6300 


cggggcgagg 


agcacccgtc 


gctgctgcgg 


gggcgggccg 


gcgcgagcga 


ccgggtcgcg 


6360 


ttcgtctttc 


ccggccaggg 


ctcgcagtgg 


gccgagatgg 


ccgacggcct 


gctcgaccgc 


6420 


tccccggcct 


tccgggcgag 


cgcgtcggcg 


tgcgacgagg 


cgctgcgggc 


gcacctcgac 


6480 


tggtccgtgc 


tggacgtgct 


gcgtcgcgtg 


ccggacgcgc 


ctgcgctgag 


ccgggtcgac 


6540 


gtggtccagc 


cggtgctgtt 


cacgatgatg 


gtgtcgctgg 


cggcggcctg 


gcgggcgctg 


6600 


ggcgtgcacc 


cgtccgccgt 


ggtcggccac 


tcgcagggtg 


agatcgcggc 


ggcccacgtg 


6660 


gcgggcggcc 


tctcgctgga 


cgacgcggcg 


cgcatcgtcg 


ccctgcgcag 


ccaggcgtgg 


6720 


ctgcggctgg 


ccgggcaggg 


cgggatggtg 


gcggtgtcgc 


tccccgtcga 


cgcgctccgc 


6780 
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gcccgcctgg cgcggttcgg cgaccggctg tccgtcgccg cggtcaacag ccccggtacg 6840 

gcggcggtga gcggctaccc cgacgcgctc gccgaactcg tcgacgagct gaccgccgag 6900 

ggcgtgcacg ccaaggcgat cccgggggtg gacacggccg ggcactccgc gcaggtggag 6960 

gtgctgaagg accacctgat ggccgccctc gccccggtgt cgccccgcag ctcgcagatc 7020 

cccttctact cgaccgtcac gggcggcctg ctggacaccg cgctgctgga cgccgcctac 7080 

tggtaccgca acatgcgcga cccggtggag ttcgagcagg cgacccgggc gatgctcgcg 7140 

gacgggcacg aggggttcct ggagcccagc ccgcacccga tgctgtcggt gtcgttgcag 7200 

ggcaccgcgg ccgatgccgg ggtcgccgcg acggtgctgg ggacactgcg gcgcggcaag ' 7260 

ggcggcgccc gctggttcgg catggcgctc gggctcgccc acgcccacgg gatcgagatc 7320 

gacgcgagtg tgctcttcgg aaccgactcg cgccgggtcg acctgccgac gtacccgttc 7380 

cagcgcgagc gcttctggta tcacccgccg gccgcgcgcg gggacgtggc ctccgccggg 7440 

ctcagcggtg ccgaccatcc gctgctgggc ggggcggtcg agctgcctga ccggggcggc 7500 

cacgtgtatc cggcccggct cggcgtccga caccacccgt ggctcggcga gcatgccctg 7560 

ctgggcgcgg cgatcctgcc cggggccgcg tacgcggaac tcgccctgtg ggccgggcgg 7620 

cgtgacgggg ccggccggat cgaggagctg accctcgacg cgccgctggt ggtggccgac 7680 

gagtcggcgg cgcaactgcg gctcgtggtg ggcccggcgg acgcggaggg gcgccggcag 7740 

ctcaccgtcc actcgcgcgc cgacggcgcg gacgcggaca ccgcgtggac ccggcacgcg 7800 

cagggcaccc tcgtgccggc cgacgccgac gccgccggga gcggggaccc gggcgcgccc 7860 

tggccgccgg ccggggccga gcccgtcgag gtggcgggcc tgtacgaccg gttcgccgac 7920 

cggggctacc agtacgggcc gtcgttccgg ggggtccggg ccgcctggcg ggccggcgac 7980 

acggtgtacg ccgaggtggc cctgcccgtc ccgcagcccg ggagcccgcg cttcggtgtc 8040 

cacccggcgc tgctcgacgc ggcgttccag gcgatgagcc tcggcgcgtt cttccccgag 8100 

gacgggcagg tccggatgcc gttcgccctg cggggcgtgt cgtcgtccgg ggtcggggcc 8160 

gaccggctgc gggtcaccat cagcccggcc ggtgccgagg cggtccggat cgcctgcgtc 8220 

gacgagcggg gcaacccggt cgtggtgatc gactccctgg tggcgcgcgc ggtgccggtg 8280 

gaggcgctca cccccggcac ccccggcacc ggggacggcg cgctgcacca cgtcgcctgg 8340 

accgcccggc cggaaccggg ggtcgccgcc gtgcagcgct gggcggtcgt gggcgcggcc 8400 

gatcccgggc tggccggggg cctggaccgg gcgggcggcc tctgcggggc gtaccccgat 8460 
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ctcgccggtc tggtcgqggc ggtggccgaa ggggcggcgc tgcccgacgt ggtcgcggtg 8520 

ccggtcccgt cgggcgcgcc ggtcgggccc gacgcggtgc gcgccaccgt gctcggcgcc 8580 

ctggacctga tccgggcctg gctcgcggtc gagggccggc tggggctggc caggctggcg 8640 

ttcgtcacca cctcggcggt ggcggtcggc gacggcaccg agcacgtgga cccggtgtcg 8700 

gccgccctgt gggggctggt gcgttccgcc cagtccgagg agcccggccg gttcgtcctc 8760 

gtcgacctgg acgccgaccc ggccagcgcc tcggccctgc ccgccgcgct cgccgccggt 8820 

gagccgcaac tggccgttcg cgccggggcg gtgcacgtgc cccggctggt tcggcaccga 8880 

ccccgcccgg acggcccgct gacgcccccg gccggtgccg cgtggcggct cgccgccggt 8940 

gggcagggca ccctggaggg cctggcgctg gtcccggccc cggacgcctt ggcgccgctg* 9000 

gcccccgggc aggtccgggt cgcggtgcgc gccgccggag tgaacttccg ggacaccctc 9060 

atcgcgctcg gcatgtaccc gggcacgccg gtgctgggtg ccgagggggc cggggtgatc 9120 

accgaggtcg cgccggacgt ggccggcttc gcccccggcg accgggtgct gggcatgtgg 9180 

accggcggcc tggggccggt ggcggtcgcc gacgcccgga tgctcgcccg ggttccgcgc , 9240 

ggctggtcgt acgccgaggc cgcgtcggtg ccggccgtct tcctcacggc ccactacgcg 9300 

ctcaccaggc tcgccgggat ccgcccgggg cagtcgctgc tggtgcacgc gggggccggc 9360 

ggcgtcggca tggcgaccct ccaactggcc cggcacctgg gcgtggaggt ctacgccacg 9420 

gcgagccggg gcaagtggga caccctgcgt ggcctcggcc tggacgacgc gcacatcgcc 9480 

gactcccgca gcctcgactt cgccggacgg ttcctggccg ccaccggggg gcgcggcgtc 9540 

gacgtggtgc tgaactccct tgccggggac ttcgtggacg cgtccctgcg gctgctgccg 9600 

cgcggcggcc acttcctgga actgggcaag gccgacgtcc gcgaccccga ccggatcgcg 9660 

gccgaccacc cgggggtcgg ctagcgggcg ttcgacctcg tcgaggctgg tccggagctg 9720 

gtcgggcagc tgctcggcga gctgatggag ctgttcgccg ccggggtgct cagcccgctg 9780 

ccgttgaccg tgcgggacgt ccggcgggcc cgggaggcgt tccgcctgat cagccaggcc 9840 

cggcacgtcg gcaaggtggt gctgaccatg ccgcccgcgt tcggcgcgta" cggcaccgtc 9900 

ctggtcaccg gcggcaccgg gacgctcggc ggcgccgtcg cccggcacct ggtcgcccgg . 9960 

cacggcgtac ggcacctggt gctcaccggc cgcagcggcc cggcggcgga cggggcgtcc 10020 

gcgctcgtcg acgagctgac cgcgtccggc gcgtcggtga ccgtcgtcgc ctgcgacgcc 10080 

gccgaccggg tcgcgctgcg ccggctgctc gacggcattc cggccgcgca cccgctcacc 10140 

gccgtcgtgc acgctgccgg cgtcctcgac gacgccacca tcaccgcgct gaccgccggg 10200 
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caggtggacg 


cggtgctgcg 


gcccaaggcc 


gacgcggtga tcaacctgca 


cgagttgacc 


10260 


cgggaccggg 


agctgtccgc 


gttcgtgctg 


ttctcctcgg cggcggccct 


gttcggcagc 


10320 


ccggggcagg 


gcaactactc 


ggcggccaac 


STSgttcgtcg acgcgttcgc 


ccagt.accgc 


10380 


cgcgcgcagg 


ggctccacgc 


ggtgtcgctg 


gcctggggcc tgtgggccga 


cagcagccgg 


10440 


atggccgggc 


acctcgacca 


ggaggggatg 


cggcgccgga tggcgcgcgg 


cggcgtcctg 


10500 


ccgctcacca 


ccgaccaggg 


cctcgccctg 


ttcgacgccg cgcagctggt 


ggacgaggcg 


10560 


ctccaggtgc 


cgatccggct 


caacgtcggc 


gcgttgcggg ccgccgggag 


ggtccccgcg 


10620 


ctcctcgccg 


acctggtgcc 


ggcggcggcg 


tcgggggccc cggccgccac 


cccgacccgg 


10680 


gacgacgcgg 


accgcacgct 


cgccgaccgg 


ctcgccgggc tgaccgtggc 


cgaacagcgg 


10740 


gagctggtgc 


tggagagcgt 


gcgcggacac 


gcggcggccg tcctcggaca 


cgccgacccg 


10800 


caggccgtcg 


acgccgaccg 


ggccttccgg 


gaactcggct tcgactcgct 


gacggcggtg 


10860 


gagctgcgca 


atcggctggc 


caccgcgtcc 


gggctgcgcG tgccggcgac 


gctggtcttc 


10920 


gaGcacccca 


ccccggaagc 


gttggcggag 


cacctgctcg ccgggctcgc 


gcccgagcag 


10980 


gcccgggccg 


agttgccgtt 


gctggccgag 


ctgggccggc tggaggcggc 


cctggccgcc 


11040 


accgacgggg 


ccgccctcga 


cgggctggac 


gacctggtgc gccgggaggt 


cggcgtccgg 


11100 


atcgcggcgc 


tggccgccag 


gtggggcgcg 


gccggcgacg acgtggccgg 


cagcgacggc 


11160 


ggcgggacgg 


ccgacgcgct 


cgagtccgct 


gacgacgacg agatcttcgc 


gttcatcgac 


11220 


gagcggttcc 


gcgcctga 








11238 



<210> 16 
<211> 1574 
<212> PRT 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> • 16 

Met Ser Asn Glu Gin Lys Leu Arg Glu Tyr Leu Arg Leu Thr Thr Thr 
15 10 15 

Glu Leu Ala Arg Ala Thr Asp Arg Leu Arg Ala Val Glu Ala Arg Ala 
20 25 30 

His Glu Pro lie Ala He Val Gly Met Ala cys Arg Tyr Pro Gly Gly 
35 40 45 

Val Gly Ser Pro Glu Glu Leu Trp Glu Leu Val Ala Ser Gly Thr Asp 
50 55 60 
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Ala He Sex Pro Phe Pro Asp Asp His Gly Trp Asp Gly Asp Ala Leu 
65 70 75 80 

Tyr Asp Pro Asp Pro Glu Ala Ala Gly Arg Thr Tyr Cys Arg Glu Gly 
85 90 95 

Gly Phe Leu Ala Gly Val Gly Asp Phe Asp Ala Ala Phe Phe Gly Il.e 
100 105 110 

Ser Pro Arg Glu Ala Leu Ala Met Asp Pro Gin Gin Arg Leu Leu Leu 
115 120 125 

Glu Thr Ser Trp Glu Ala Leu Glu Arg Ala Gly He Pro Pro Asp Ser 
130 135 140 

Leu Arg Gly Ser Arg Thr Gly Val Cys Val Gly Ala Trp His Gly Gly 
145 150 155 160 

Tyr Thr Asp Val Val Gly Gin Pro Pro Ala Glii Leu Glu Gly His Leu 
165 170 175 

Leu Thr Gly Gly Val Val Ser Phe Thr Ser Gly Arg He Ser Tyr Ala 
180 185 190 

Leu Gly Leu Glu Gly Pro Ala Leu Thr Val Asp Thr Ala Cys Ser Ser 
195 200 205 

Ser Leu Val Ala Leu His Leu Ala Val Arg Ala Leu Arg Gin Gly Glu 
210 215 220 

Cys Asp Leu Ala Leu Ala Gly Gly Ala Thr Val Leu Ala Ser Pro Ala 
225 230 235 240 

Val Phe Val Gin Phe Ser Arg Gin Arg Gly Leu Ala Pro Asp Gly Arg 
245 250 255 

Cys Lys Ala Phe Ala Asp Ser Ala Asp Gly Phe Gly Pro Ala Glu Gly 
260 265 270 

Val Gly Met Leu Val Val Glu Arg Leu Ser Asp Ala Val Arg His Gly 
275 280 285 

Arg Arg Val Leu Ala Leu Val Thr Gly Thr Ala Val Asn Gin Asp Gly 
290 295 300 

Ala Ser Asn Gly Leu Thr Ala Pro Ser (Gly Pro Ala Gin Glu Lys Val 
305 310 315 320 

Leu Arg Gin Ala Leu Val Asp Ala Arg Val Thr Ala Ala Asp Val Asp 
325 330 335 

Ala Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly Asp Pro He Glu 
340 345 350 

Val Arg Ala Leu Met Asn Val Tyr Gly Ala Gly Arg Pro Ala Asp Arg 
355 360 365 
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Pro Leu Trp Leu Gly Ser Leu Lys Ser hsn lie Gly His Thr Gin Ala 
370 375 380 

Ala Ala Gly Val Gly Gly Val He Lys Thr Val Leu Ala Met Arg His 
385 390 395 400 

Gly Val Leu Pro Pro Thr Leu His Val Asp Ala Pro Thr Thr Glu Val 
405 410 415 

Asp Trp Ser Ala Gly Gin Val Ala Leu Leu Arg Ala Glu Thr Pro Trp 
420 425 430 

Pro Asp Thr Gly Arg Pro Arg Arg Ala Gly Val Ser Ser Phe Gly Val 
.435 440 445 

Ser Gly Thr Asn Ala His Val Val Leu Glu Gin Ala, Pro Gly Pro Ala 
450 455 460 

Ala Ala Pro Ala Gly Asp Ala Pro Pro Ala Glu Thr Arg Pro Val Gly 
465 470 475 480 

Asp Pro Pro Pro Val Val Pro Leu Val Leu Ser Ala Arg Ser Gin Pro 
485 490 495 

Ala Leu Ala Gly Gin Ala Arg Arg Leu Arg Asp Leu Leu Ala Ala Ala 
500 505 510 

Pro Glu Thr Asp Leu Ala Ser Ala Gly Leu Ala Leu Ala Thr Ala Arg 
515 520 525 

Ser Val Phe Asp His Arg Ala Val Val Thr Ala Ala Gly Arg Pro Gin 
530 535 540 

Ala Leu Asp Ala Leu Asp Leu Leu Ala Gly Gly Glu Pro Gly Pro Ala 
545 550 555 560 

Val Thr Thr Gly Val Ala Ala Pro Thr Gly Arg Thr Val Phe Val Phe 
565 570 575 

Pro Gly Gin Gly Thr His Trp Ala Gly Met Gly Ala Asp Leu Leu Asp 

580 585 590 

Gin Ser Pro Val Phe Ala Glu Ser Met Arg Arg Cys Glu Gin Ala Leu 
595 600 605 

Ser Ala His Thr Asp Trp Lys Leu Gly Glu Val He Arg Gly Ala Ala 
610 615 620 

Gly Ser Pro Pro Leu Asp Arg Val Asp Val Leu Gin Pro Val Ser Trp 
625 630 635 640 

Ala Val Met Val Ser Leu Ala Gin Val Trp Arg Ser Leu Gly Val Glu 
645 650 655 

Pro Asp Ala Val Val Gly His Ser Gin Gly Glu He Ala Ala Ala Val 
660 665 670 
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Val Cys Gly Ala Leu Thr Leu Pro Asp Ala Ala Arg Val Val Ala Leu 
675 680 685 

Arg Ser Gin Val lie Gly Arg Val Leu Ser Gly Arg Gly Gly Met Ala 
690 695 700 

Ser Val Gin Leu Pro Ala Arg Glu Val Ala Gly Arg Leu Ala Ala Trp 

705 710 715 720 

Ala Gly Arg Leu Asp Val Ala Ala Val Asn Gly Pro Gin Ser Thr Val 
725 730 735 

Val Ser Gly Ala Ala Asp Ala Val Thr Glu Leu Val Glu Ala Phe Ala 
740 745 750 

Ala Glu Asp Val Arg Val Arg Arg lie Pro Val Asp Tyr Ala Ser His 
755 760 765 

Ser Thr Gin Val Asp Arg Leu Arg Ala Glu Leu Leu Thr Val Leu Gly 
770 775 780 

Pro Val Asp Ala Arg Pro Ala Gin Val Pro Phe Tyr Ser Thr Val Gin 
785 790 795 800 

Gly Gly Arg Val Asp Thr Ala Gly Leu Asp Ala Gly Tyr Trp Tyr Arg 



Asn Leu Arg Gly Gin Val Arg Phe Glu Glu Thr Val Arg Val Leu Leu 
820 825 830 

Asp Asp Gly His Arg Ala Phe Val Glu Ala -Ala Ala His Ala Val Leu 
835 840 845 

Val Pro Ala He Gin Glu Leu Gly Asp Ser Ala Gly Val Arg Val Val 
850 855 860 

Ala Val Gly Ser Leu Arg Arg Glu Ala Gly Gly Leu Asp Arg Leu Leu 
865 870 875. 880 

Ala Ser Ala Ala Glu Ala Phe Thr Gin Gly Val Ala Val Asp Trp Ser 
885 890 895 

Arg Ala' Leu Ala Gly Ala Ala Arg Val Ala Val Asp Leu Pro Thr Tyr 
900 905 910 

Ala Phe Gin Arg Gin Arg Tyr Trp Leu Glu Pro Ala Ala Gin Ala Asp 
915 920 925 

Ser Gly Pro Ala Gly Asp Gly Trp Arg Tyr Arg Val Gly Trp Arg Arg 
930 935 940 

Leu Gin Arg Thr Gly Ala Ala Pro Ala Asp Arg Trp Leu Leu Val Thr 
945 950 955 960 

Gly Pro Glu Gin Pro Ala Glu Leu Val Glu Ala Val Arg Asp Ala Leu 



805 



. 810 



815 



965 



970 



975 
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Thr Ala Arg Gly Ala Glu Val Arg Leu Val Thr Val Glu Pro Thr Ser 
980 985 990 

Thr Asp Arg Ala Ala Cys Ala Ala Leu Leu Thr Ala Ala Gly Ala Gly 
995 1000 1005 

Gly Ala Thr Arg Val Leu Ser Leu Leu Gly Thr Asp Arg Arg Pro 
1010 1015 1020 

His Pro Asp His Pro Ala Val Ser Val Gly Ala Ala Ala Thr Leu 
1025 1030 1035 

Leu Leu Thr Gin Ala Val Ala Asp Ala Leu Pro Ala Ala Arg Leu 
1040 1045 1050 

Trp Val Val Thr Arg Gly Ala Val Ser Val Gly Pro Gly Glu Thr 
1055 1060 1065 

Ala Asp Glu Arg Gin Ala Gin Val Trp Gly Phe Gly Arg Val Ala 
1070 1075 1080 

Ala Leu Glu Leu Pro Arg Thr Trp Gly Gly Leu Val Asp Leu Pro 
1085 1090 1095 

Ala Asp Ala Asp Gly Pro Val Trp Glu Ala Phe Val Asp Val Leu 
1100 1105 . 1110 

Ala Gly Asp Glu Asp Gin Val Ala Leu Arg Gly Pro Val Gly Tyr 
1115 1120 1125 

Gly Arg Arg Leu Arg Arg Ala Pro Ala Leu Pro Ala Lys Arg Arg 
1130 1135 1140 

Tyr Arg Pro Arg Gly Thr Val Leu Val Thr Gly Gly Thr Gly Ala 
1145 1150 1155. 

Leu Gly Ala His Val Ala Arg Arg Leu Ala Ala Gly Gly Ala Ala 
1160 1165 1170 

His Leu Val Leu Thr Ser Arg Arg Gly Ala Asp Ala Pro Gly Ala 
1175 1180 1185 

Ala Gly Leu Val Gly Glu Leu Arg Ala Leu Gly Ala Glu Val Thr 
1190 1195 ' 1200 

Val Ala Val Cys Asp Val Ala Asp Arg Ala Ala Val. Ala Ala Leu 
1205 1210 1215 

Leu Ala Gly Leu Pro Ala Asp Ala Pro Leu Ser Ala Val Phe His 
1220 1225 1230 

Thr Ala Gly Val Ala His Ser Met Pro lie Gly Glu Thr Gly Leu 
1235 1240 1245 

Thr Asp Val Ala Glu Val Phe Ala Gly Lys Val Ala Gly Ala Arg 
1250 1255 1260 
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His Leu Asp Glu Leu Thr Arg Gly His Asp Leu Asp. Ala Phe Val 
1265 1270 1275 

Leu Tyr Ser Ser Asn Ala Gly Val Trp Gly Ser Ser Gly Gin Ser 
1280 1285 1290 

Ala Tyr Gly Ala Ala Asn Ala Ala Leu Asp Ala Leu Ala Glu Arg 
1295 1300 1305 

Arg Arg Ala Ala Gly Leu Thr Ala Thr Ser Val Ala Trp Gly Leu 
1310 1315 1320 

Trp Gly Ser Gly Gly Met Gly Glu Gly Asp Ala Glu Glu Tyr Leu 
1325 1330 1335 

Ser Arg Arg Gly Leu Arg Pro Met Pro Pro Glu Arg Gly Val Asp 
1340 1345 1350 

Ala Leu Leu Ala Ala Leu Asp Arg Asp Glu Thr Phe Val Ala Val 
1355 1360 1365 

Ala Asp Val Asp Trp Thr Leu Phe Thr Ala Gly Phe Thr Ala Phe 
1370 1375 1380 

Arg Pro Ser Pro Leu Leu Gly Asp Leu Pro Glu Ala Arg Ala Thr 
1385 1390 1395 

Leu Ala Asp Ala Gly Pro Ala Gly Ser Asp Leu Pro Ala Trp His 
1400 1405 1410 

Ala Ala Ala Ser Pro Asp Glu Arg Arg Arg Gly Leu Leu Asp Leu 
1415 1420 1425 

Val Arg Arg Gin Val Ala Ala Val Leu Gly His Pro Gly Pro Glu 
1430 1435 • 1440 

His Val Gly Pro Asp Ala Ala Phe Arg Glu lie Gly Phe Asp Ser 
1445 1450 1455 

Leu Thr Ala Val Asp Leu Ala Lys Arg Leu Arg Ala Ala Val Gly 
1460 1465 1470 

Val Pro Leu Ser Ala Thr Leu Val Phe Asp His Pro Thr Ala Thr 
1475 1480 1485 

Ala Val Ala Glu His Leu Ala Gly Leu Leu Gly Pro Ala Pro Ala 
1490 1495 1500 

Gly Gly Asp Pro Arg Glu Ala Glu Val Arg Arg Ala Leu Ala Asp 
1505 1510 1515 

Leu Pro Leu Ala Arg Leu Arg Asp Ala Gly Leu Leu Asp Gly Leu 
1520 1525 1530 

Leu Ala Leu Ala Gly Leu Asp Ala Asp Ala Val Pro Asp Gly Pro 
1535 1540 1545 
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Glu Pro Ala Pro Gly Asp Ala He Asp Glu Leu Asp Pro Glu Glu 
1550 1555 1560 

Leu Val Arg Arg Val Leu Asp Asn Ala Ser Ser 
1565 1570 

<210> 17 
<211> 4725 
<212> DNA 

<213> micromonospora carbonacea s\abspecies aurantiaca 
<400> 17 

atgtcgaacg agcagaagct ccgcgagtac ctgcggttga ccaccaccga gctggccagg 60 

gccaccgacc ggctgcgcgc ggtcgaggcg cgggcgcacg agccgatcgc gatcgfccggc 120 

atggcctgcc ggtaccccgg cggggtcggc tcaccggagg aactgtggga gctggtcgcc 180 

tcgggcacgg acgcgatctc cccgttcccc gacgaccacg gctgggacgg cgacgcgctg 240 

tacgacccgg acccggaggc ggcgggccgc acctactgcc gcgagggcgg gttcctcgcc 300 

ggggtcggcg acttcgacgc cgcgttcttc ggcatctcgc cccgcgaggc gctggccatg 360 

gacccgcagc agcgcctgct gctggagacg tcctgggagg cgctggagcg ggccgggatc 420 

cccccggact cgctgcgcgg cagccgtacc ggggtgtgcg tcggggcgtg gcacggcggc 480 

tacaccgacg tcgtcgggca gcccccggcg gaactggagg gccacctgct gaccggcggg 540 

gtggtcagct tcacctcggg gcggatctcg tacgcgctgg gcctggaggg gcccgcgttg 600 

acggtggaca ccgcctgctc gtcctcgctg gtggccctgc acctggcggt gcgggccctg 660 

cggcagggcg agtgcgacct ggcgttggcc ggcggggcga cggtgctggc cagcccggcg 720 

gtgttcgtgc agttctcgcg gcagcggggg ctggccccgg acggccggtg caaggcgttc 780 

gccgactcgg cggacgggtt cgggccggcc gagggggtcg gcatgctggt cgtggagcgg 840. 

ctgtcggacg ccgtccgcca cgggcgccgg gtgctggccc tggtcaccgg cacggcggtc 900 

aaccaggacg gggcgagcaa cggcctcacc gcccccagcg gcccggcgca ggagaaggtg 960 

ctgcgccagg cgctcgtgga cgcccgggtg acggccgccg acgtcgacgc ggtcgaggcg . 1020 

cacggcaccg gcacccggct cggcgacccg atcgaggtgc gggccctgat gaacgtgtac 1080 

ggtgccggcc ggcccgccga ccgtccgctc tggctcggtt cgctgaagtc caacatcggc 1140 

cacacccagg cggcggccgg ggtcggcggg gtcatcaaga cggtgctggc gatgcggcac 1200 

ggcgtcctgc cgcccaccct gcacgtggac gccccgacca ccgaggtcga ctggtccgcc 1260 

ggccaggtgg ccctgctgcg ggcagagaca ccgtggccgg acacgggtcg cccgcgccgc 1320 

gccggggtct cctccttcgg ggtgagcggc accaacgcgc acgtggtgct ggagcaggcc 1380 
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cctgggcccg 


ccgccgcccc 


ggcgggtgac 


gccccgcccg 


ccgagacccg 


gcccgtcggc 


1440 


gacccgccgc 


cggtcgtacc 


gctggtgttg 


tccgccaggt 


cgcagccggc 


gctggccggg 


1500 


caggcccgcc 


ggctgcgcga 


cctgctggcc 


gcagcgccgg 


agaccgacct 


cgccagcgcc 


1560 


ggactcgccc 


tggccaccgc 


gcggtcggtg 


ttcgaccacc 


gggcggtggt 


gacggccgcc 


1620 


gggcgaccgc 


aggcgctcga 


cgcgctcgac 


ctgctggccg 


gcggcgaacc 


cggaccggcg 


1680 


gtcacgaccg 


gcgtcgccgc 


ccccaccggg 


cgcaccgtgt 


tcgtctttcc 


cgggcagggg 


1740 


acgcactggg 


ccggcatggg 


tgccgacctg 


ctcgaccagt 


caccggtgtt 


cgccgagtcg 


1800 


atgcgacggt 


gcgagcaggc 


gctgtcggcg 


cacaccgact 


ggaagctcgg 


cgaggtgatc 


1860 


cggggcgcgg 


ccggcagccc 


gccgctggac 


cgcgtggacg 


tgctccagcc 


cgtctcctgg 


1920 


gcggtgatgg 


tgtcgctggc 


gcaggtgtgg 


cggtcgctcg 


gcgtcgagcc 


ggacgcggtg 


1980 


gtcggccatt 


cccagggcga 


gatcgccgcc 


gcggtggtct 


gcggcgcgct 


gaccctgccg 


2040 


gacgcggccc 


gggtggtcgc 


gctgcggtcc 


caggtcatcg 


gtcgggtgct 


ctccggtcgc 


2100 


ggcggcatgg 


cgtccgtcca 


gctgccggcc 


cgggaggtcg 


cggggcggct 


ggccgcctgg 


2160 


gcgggccggc 


tcgacgtcgc 


ggccgtcaac 


gggccacagt 


cgaccgtcgt 


gtccggtgcc 


2220 


gccgacgcgg 


tcaccgaact 


ggtcgaggcg 


ttcgcggccg 


aggacgtccg 


ggtgcggcgg 


2280 


atcccggtgg 


actacgcgtc 


ccactcgacg 


caggtggacc 


ggctgcgcgc 


cgagctgctc 


2340 


accgtcctgg 


gcccggtcga 


cgcccgtccg 


gcgcaggtgc 


ccttctactc 


gacggtgcag 


2400 


ggcgggcgcg 


tcgacactgc 


cggcctggac 


gccggctact 


ggtaccgcaa 


cctgcggggg 


2460 


caggtccgct 


tcgaggagac 


cgtgcgggtg 


ctgctcgacg 


acgggcaccg 


cgccttcgtc 


2520 


gaggccgccg 


cgcacgccgt 


cctcgtaccc 


gcgatccagg 


agctggggga 


cagcgccggc 


2580 


gtccgggtgg 


tggccgtggg 


gtcgctgcgc 


cgggaggcgg 


gcggcctgga 


ccggctcctg 


2640 


gcctcggcgg 


ccgaggcgtt 


cacccagggg 


gtggccgtgg 


actggtcccg 


ggctctggcc 


2700 


ggggccgcgc 


gcgtcgccgt 


ggacctgccc 


acgtacgcgt 


tccagcggca 


acgctactgg 


2760 


ctggagcccg 


ccgcgcaggc 


ggactccggc 


ccggccgggg 


acggctggcg 


ctaccgggtc 


2820 


ggctgguggu 




ciL*\,«y y o y w v« 


yoyL»oyyv»^y 


ao^yy uyy i« 


y c ^y 9 uy dww 


A O O w 


ggcccggagc 


agccggcgga 


gctggtcgag 


gcggtgcgcg 


acgcgctcac 


cgcgcggggc 


2940 


gccgaggtgc 


gcctggtgac 


cgtcgagccg 


accagcaccg 


accgggccgc 


gtgcgcggcg 


3000 


ttgctcaccg 


cggccggtgc 


gggcggggcg 


acccgggtgc 


tgtcgctgct 


cggcaccgat 


3060 
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cgtcgcccgc accccgacca cccggccgtg tccgtcggcg ccgccgcgac gttgctgctg 3120 

acccaggccg tcgccgacgc cctgccggcc gcccggctgt gggtcgtcac ccggggcgcg 3180 

gtctccgtcg ggcccggcga gaccgccgac gagcgccagg cgcaggtctg ggggttcggc 3240 

cgggtcgcgg ccctcgaact gccccgcacg tggggcgggc tcgtcgacct gcccgccgac 3300 

gcggacggcc cggtgtggga ggcgttcgtg gacgtgctgg ccggggacga ggaccaggtc 3360 

gcgctgcgcg gcccggtcgg gtacggtcgc cggctccggc gcgcccccgc gctacccgcg 3420 

aagcggcggt accggcccag gggcaccgtc ctggtcaccg gcggcaccgg cgcgctcggc 3480 

gcgcacgtgg cccggcggtt ggccgccggc ggggccgcgc acctcgtgct caccagccgg 3540 
cgcggggccg acgcccccgg tgcggccggg ctggtcgggg aactccgggc gctgggcgcc ' 3600 

gaggtgaccg tcgcggtctg cgacgtcgcc gaccgggccg ccgtggcggc gctgctcgcc 3660 

gggctgcccg ccgacgcgcc gctgagcgcg gtcttccaca ccgcgggcgt ggcgcactcg 3720 

atgccgatcg gcgagaccgg gctcaccgac gtcgccgagg tgttcgccgg gaaggtcgcc 3780 

ggagcccgcc acctcgacga actcacccgg gggcacgacc tggacgcgtt cgtcctgtac 3840 

tcgtcgaacg cgggcgtgtg gggcagcagc gggcagagcg cgtacggggc ggccaacgcg 3900 

gccctcgacg cgctcgccga acggcggcgc gccgccgggc tgaccgccac ctccgtcgcc 3960 

tggggcctgt ggggctccgg gggcatgggc gagggcgacg ccgaggagta cctgagccgc 4020 

cggggcctgc ggccgatgcc tcccgagcgt ggcgtggacg ccctcctggc cgccctggac 4080 

cgggacgaga ccttcgtcgc cgtcgccgac gtggactgga cgctgttcac ggccgggttc 4140 

accgcgttcc ggcccagccc gctgctcggc gacctcccgg aggcccgcgc gacgctggcc 4200 

gacgccggac ccgcgggctc cgacctgccg gcctggcacg ccgccgcgag ccccgacgaa 4260 

cgccgccggg gcctgctcga cctggtacgc cggcaggtcg ccgccgtcct cggccacccg " 4320 

gggcccgagc acgtcggccc cgacgccgcg ttccgggaga tcggattcga ctcgctgacc 4380 

gccgtcgacc tggccaagcg gctcagggcg gcggtcggcg tgccgctgtc cgccaccctc 4440 

gtcttcgacc accccaccgc gacggcggtc gccgagcacc tggccgggct gctcggtccc 4500 

gcgccggccg gcggcgaccc gcgcgaggcc gaggtgcgcc gggccctggc cgacctgccg 4560 

ctggcccggc tgcgggacgc cggcctactg gacggcctgc ttgcgcttgc ggggctggac 4620 

gccgacgcgg tgccggacgg gcccgagccg gctcccggcg acgccatcga cgaactcgat 4680 

ccagaggagc tggtgcgccg ggtgctggac aacgccagct cctga 4725 
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<210> 18 

<211> 1784 

<212> PRT 

<213> microraonospora carbonacea siibspecies aurantiaca 

<400> 18 

Met Val Met Pro Pro Asp Lys Val He Glu Ala Leu Arg Val Ser Val 
1 5 10 . 15 

Lys Glu Thr Glu Arg Leu Arg Arg Gin Asn His Glu Leu Leu Ala Ala 
20 25 30 

Leu His Gly Pro He Ala Val Val Gly Met Ala Cys Arg Tyr Pro Gly 
35 40 45 

Gly Val Ser Ser Pro Glu Asp Leu Trp Arg Leu Val Glu Thr Gly Thr 
50 55 60 

Asp Ala He Gly Gly Phe Pro Thr Asp Arg Gly Trp Asp Val Asp Ala 
65 70 75 80 

Val Tyr Asp Pro Asp Pro Glu Ser Arg Asn Thr Thr Tyr Cys Arg Glu 
85 90 95 

Gly Gly Phe Leu Ala Gly Ala Gly Asp Phe Asp Ala Ala Phe Phe Gly 
100 105 110 

Val Ser Pro His Glu Ala Val Val Met Asp Pro, Gin Gin Arg Leu Leu 
lis 120 125 

Leu Glu Val Ser Trp Glu Ala Leu Glu Arg Ser Gly Thr Asp Pro His 
130 135 140 

Ser Leu Arg Gly Ser Arg Thr Gly Val Tyr Val Gly Ala Ala His Gin 
145 150 155 160 

Gly Tyr Ala Val Asp Ala Gly Gin Val Pro Glu Gly Ala Glu Gly Phe 
165 170 175 

Arg Leu Thr Gly Ser Ala Asp Ala Val Leu Ser Gly Arg He Ser Tyr 
180 185 190 

Leu Leu Gly Leu Glu Gly Pro Ala Leu Thr Val Glu Thr Ala Cys Ser 
195 200 205 

Ser Ser Leu Val Ala Val His Leu Ala Val Gin Ala Leu Arg Arg Gly 
210 215 220 

Glu Cys Gly Leu Ala Leu Ala Gly Gly Val Ala Val Met Pro Asp Pro 
225 230 235 240 

Ala Ala Phe Val Glu Phe Ser Arg Gin Arg Gly Leu Ala Ala Asp Gly 
245 250 255 

Arg Cys Arg Ala Phe Gly Ala Gly Ala Asp Gly Thr Gly Trp Ala Glu 
260 265 270 
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Gly Val Gly Val Leu Val I.eu Gin Arg Leu Ser Asp Ala Val Arg Asp 
275 280 285 

Gly Arg Trp Val Leu Gly Val He Arg Gly Ser Ala Val Asn Gin Asp 
290 295 300 

Gly Ala Ser Asn Gly Leu Thr Ala Pro Ser Gly Pro Ala Gin Gin Arg 
305 310 315 320 

Val He Arg Gin Ala Leu Thr Asp Ala Arg Leu Gly Ala Asp Gin He 
325 330 335 

Asp Ala Val Glu Ala His Gly Thr Gly Thr Arg Leu Gly Asp Pro He 
340 345 350 

Glu Ala Gin Ala Leu He Ala Ala Tyr Gly Ala Asp Arg Thr Pro Asp 
355 360 365 

Arg Pro Leu Trp Leu Gly Ser Leu Lys Ser Asn He Gly His Ala Gin 
370 375 380 

Ala- Ala Ala Gly Val Gly Gly Leu He Lys Met Leu Leu Ala Met Arg 
385 390 395 400 

Ala Gly Thr -Leu Pro Pro Thr Leu His Ala Asp Val Pro Thr Pro Leu 
405 410 415 

Val Asp Trp Ser Ala Gly Val Val Arg Leu Ser Thr Gly Val Val Pro 
420 425 430 

Trp Pro Ala Leu Pro Gly Ala Pro Arg Arg Ala Gly He Ser Ala Phe 
435 440 445 

Gly Val Ser Gly Thr Asn Ala His Val He Val Glu Gin Pro Pro Pro 
450 455 460 

Val Pro Val Asp Asp Pro Ala Pro Pro Thr Arg Thr Leu Pro Leu Val 
465 470 475 480 

Pro Trp Val Leu Ser Gly Arg Thr Glu Ala Ala Leu Arg Ala Gin Ala 
485 490 495 

Asp TVrg Leu Arg Thr His Leu Ala Ala His Pro Asp Ala Asp Pro Leu 
500 505 510 

Asp Val Gly Phe Ser Leu Ala Thr Ser Arg Ala Ala Leu Glu His Arg 
515 520 . 525 

Ala Val Leu Val Ala Ala Asp Arg Asp Gly Leu Leu Arg Leu Val Asp 
530 535 540 

Ala Leu Ala Ala Gly Glu Pro Ala Ala Gly Leu He Arg Gly Thr Val 
545 550 555 560 



Arg His Asp Arg Arg Thr Gly Phe Leu Phe Ala Gly Gin Gly Gly Gin 
565 570 575 
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Arg Val Gly Met Ala Arg Glu Leu Tyr Glu Ala Phe Pro Ala Phe Ala 
580 585 590 

Asp Ala Leu Asp Gin Leu Ala Ala Arg Leu Asp Arg His Leu Asp Arg 
595 600 605 

Pro Leu Leu Arg Val Leu Phe Ala Glu Pro Gly Ser Asp Asp Ala Arg 
610 615 620 

•Leu Leu Asp Gly Thr Arg Tyr Ala Gin Ala Ala Leu Phe Ala Val Glu 
625 630 635 640 

Val Ala Leu Phe Arg Leu Val His Gly Trp Gly Val Arg Pro Asp Val 
645 650 655 

Leu Leu Gly His Ser Val Gly Glu Leu Ala Ala Ala His Val Ala Gly 
660 665 670 

Val Leu Asp Val Asp Asp Ala Cys Glu Leu Val Ala Ala Arg Gly Arg 
675 680 685 

Leu Met Gly Glu Leu Pro Ser Gly Gly Ala Met Val Ala Val Arg Ala 
690 695 700 

Thr Glu Glu Glu Val Gly Pro Leu Leu Asp Gly Gin Arg Val Ala Val 
705 710 715 720 

Ala Ala Val Asn Gly Pro Arg Ser Val Val Val Ser Gly Asp Glu Glu 
725 730 735 

Ala Val Leu Ala Val Ala Ala Arg Cys Ala Ala Leu Gly His Arg Thr 
740 745 750 

Arg Arg Leu Asn Val Ser His Ala Phe His Ser Pro His Val Glu Ala 
755 760 765 

Met Leu Glu Pro Phe Arg Arg Val Ala Arg Gly Leu Thr Tyr His Ala 
770 775 780 

Pro .Thr He Pro Val Val Ser Asn Ala Thr Gly Arg Leu Ala Thr Ala 
785 790 795 ' 800 

Asp Ala Leu Arg Asp Pro Gly Tyr Trp Val Arg His Val Arg Gin Pro 
805 810 815 

Val Arg Phe Arg Asp Gly Val Arg Ala Ala Arg Asp Gin Gly Ala Thr 
820 825 830 

Ala Phe Val Gly Leu Gly Pro Asp Gly Val Leu Cys Ala Leu Ala Glu 
835 840 645 

Glu Cys Leu Gly Pro Thr Gly Asp Val Leu Leu Leu Pro Val Leu Arg 
850 855 860 



Pro Gly Arg Pro Glu Pro Ala Thr Leu Leu Ala Ala Leu Ala Gly Ala 
865 870 875 880 
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Tyr Ala Gly Gly Ala Glu Met Asp Trp Ser Arg Val Phe Ala Gly Thr 
885 890 895 

Gly Ala Arg Arg Val Glu Leu Pro Thr Tyr Ala Phe Gin His Arg Arg 
900 905 910 

Tyr Trp Leu Ala Pro Gly Pro Pro Ser Ala Arg Arg Asp Asp Ala Trp 
915 . 920 925 

Arg Tyr Arg lie Ala Trp Arg Pro Leu Pro Thr- Val Pro Ala Ala Ala 
930 935 • 940 

Gly Thr Glu Thr Val Ala Gly Ala Trp Leu Leu Val Val Pro Ala His 
945 950 955 960 

Asp Gly Val Ala Ser Leu Ala Asp Ala Ala Glu Arg Ala Val His Arg 
965 970 975 

Gly Gly Ala Thr Val Thr Arg Leu Thr Val Asp Ala Ala Asp Val Asp 
980 985 990 

Arg Asp Thr Leu Ala Ala Val Leu Thr Glu Ala Ala Ala Asp Ala Asp 
995 1000 1005 

Gly Gly Pro Asp Gly Val Leu Cys Leu Leu Gly Leu Asp Asp Arg 
1010 1015 1020 

Ala His Pro Arg Ser Ala Ser Val Pro Arg Gly Val Leu Ala Thr 
1025 1030 1035 

Leu Ser Leu Ala Gin Ala Leu Thr Asp Leu Gly Ala Ser Ala Arg 
1040 1045 1050 

Leu Trp Cys Val Thr Arg Gly Ala Val Ala Val Thr Pro Gly Glu 
1055 1060 1065 

Ser Pro Ser Val Ala Gly Ala Gin Leu Trp Gly Phe Gly Arg Val 
1070 1075 1080 

Ala Ala Leu Glu Leu Pro Arg Ser T3:p Gly Gly Leu Val Asp Leu 
1085 1090 1095 ' 

Pro Val Asp Pro Asp Asp Arg Asp Trp Asp Leu Leu Arg Arg Ala 
1100 1105 1110 

Leu Arg Gly Pro Glu Asp Gin Val Ala Val Arg Gly Ala Val Gly 
1115 1120 1125 

Tyr Ala Arg Arg Leu Val Pro Ala Pro Ala Pro Arg Ala Glu Arg 
1130 1135 1140 

Ala Trp Arg Pro Arg Gly Thr Val Leu Val Thr Gly Gly Thr Gly 
1145 1150 1155 

Ala Leu Gly Ala His Thr Ala Arg Trp Leu Ala Arg Asn Gly Ala 
1160 1165 1170 
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Thr His Leu Val Leu Thr Ser Arg Arg Gly Gly Asn Ala Pro Gly 
1175 1180 1185 

Val Ala Ala Leu Arg Ala Glu Leu Val Thr Leu Gly Ala Glu Val 
1190 1195 1200 

Thr Val Val Ala Cys Asp Val Ala Asp Arg Glu Ala Val Ala Gly 
1205 1210 1215 

Leu Leu Ala Gly lie Pro Arg Ala Ala Pro Leu Thr Ala Val Phe 
1220 1225 1230 

His Ala Ala Gly Val Pro Gin Val Thr Pro Leu His Glu Thr Thr 
1235 1240 1245 

Pro Glu Leu Phe Ala Gin Val Cys Ala Gly Lys Val Ala Gly Ala 
1250 1255 1260 

Val His Leu His Glu Leu Ala- Gly Asp Leu Asp Ala Phe Val Thr 
1265- 1270 1275 

Phe Ala Ser Ala Ala Gly Val Trp Gly Ser Gly Gly Gin Cys Ala 
1280 1285 1290 

Tyr Ala Ala Ala Asn Ala Ala Leu Asp Ala Leu Ala Glu Arg Arg 
1295 1300 1305 

Arg Ala Ala Gly Leu Pro Ala Thr Ser Val Ala Trp Gly Val Trp 
1310 1315 1320 

Gly Gly- Pro Gly Met Gly Ala Gly Ala Gly Glu Glu Tyr Leu Arg 
1325 1330 1335 

Arg Arg Gly Val Arg Ala Met Pro Pro Ala Ala Ala Leu Ala Ala 
1340 1345 1350 

Leu Gly Arg lie Leu Asp Ala Asp Glu Thr Gly Val Thr Val Ser 
1355 1360 1365 

Asp Thr Glu Trp Gly Arg Phe Ala Ser Gly Phe Ala Ala Ala Arg 
1370 1375 1380 

Pro Ala Pro Leu Leu Ala Glu Leu Pro Gly Gly Asp Val Asp Pro 
1385 1390 1395 

Ala Gly Pro Ala His Arg Ala Gin Pro Pro Val Pro Arg Pro Ala 
1400 1405 1410 

Pro Ala Ala Thr Asp Arg Pro Gly Leu Leu Ala Leu Val Arg Ala 
1415 1420 1425 

Glu Ala Ala Gly Val Leu Gly His Asp Gly Ala Asp Asp Val Pro 
1430 1435 1440 

Ala Asp Ala Glu Phe. Ser Ala Leu Gly Phe Asp Ser Leu, Ala Ala 
1445 1450 1455 
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Val Gin Leu Arg Arg Arg Leu Ala Glu Ala Thr Gly Leu Ser Leu 
1460 1465 1470 

Ser Ala Pro Val Leu Phe Asp His Arg Thr Pro Asp Ala Leu Ala 
1475 1480 1485 

Ala His Leu His Gly Leu Leu Thr Gly Ala Ala Gly Gly Pro Pro 
1490 1495 1500 

Ala Pro Ala Ala Gly Ser Ala Leu Val Glu Met Tyr Arg Arg Ala 
1505 1510 1515 

Val Ala Thr Gly Arg Ala Ala Glu Ala Val Glu Val Leu Gly Thr 
1520 1525 1530 

« 

Val Ala Thr Phe Arg Pro Val Phe Arg Ser Pro Asp ' Glu Leu Gly 
1535 1540 1545 

Glu Pro Pro Ala Leu Val Pro Leu Gly Thr Gly Ala Gly Gly Pro 
1550 1555 1560 

Ala Leu Val Cys Cys Ala Gly Thr Ala Ala Ala Ser Gly Pro Arg 
1565 1570 1575 

Glu Phe Thr Ala Phe Ala Ala Ala Leu Ala Gly Leu Arg Asp Val 
1580 1585 1590 

Thr Val Leu Pro Gin Thr Gly Phe Leu Pro Gly Glu Pro Leu Pro 
1595 1600 * 1605 

Ala Gly Leu Asp Val Leu Leu Asp Ala Gin Ala Asp Ala Val Leu 
1610 1615 1620 

Ala His Cfys Ala Gly Gly Pro Phe Val Leu Val Gly His Ser Ala 
1625 1630 1635 

Gly Ala Asn Met Ala His Ala Leu Thr Val Arg Leu Glu Ala Arg 
1640 1645 1650 

Gly Ala Asp Pro Ala Ala Leu Val Leu Met Asp He Tyr Thr Pro 
1655 1660 1665 

Ala Ala Pro Gly Ala Met Gly Val Trp Arg Glu Glu Met Leu Ala 
1670 1675 1680 

Trp Val Ala Glu Arg Ser Val Val Pro Val Asp Asp Thr Arg Leu 
1685 1690 1695 

Thr Ala Met Gly Ala Tyr His Arg Leu Leu Leu Asp Trp Ala Pro 
1700 1705 1710 

Arg Pro Thr Arg Ala Pro Val Leu His Leu Tyr Ala Gly Glu Pro 
1715 1720 1725 

Ala Gly Ala Trp Pro Asp Pro Arg Gin Asp Trp T^g Ser Arg Phe 
1730 1735 1740 
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Asp Gly Ala His Thr Ser Ala Glu Val Pro Gly Thr His Phe Ser 
1745 1750 1755 

Met Met Thr Glu His Ala Pro Val Thr Ala Ala Thr Val His Lys 
1760 1765 1770 

Trp Leu Asp Glu Val Cys Pro Pro Arg Val Pro 
1775 1780 

<210> 19 
<211> 5355 
<212> DNA 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 19 

atggtcatgc cccccgacaa ggtgatcgag gcgctgcgtg tctccgtcaa ggagacggag 60 

cggctgcgcc ggcagaacca cgagctgctc gccgccctgc acgggccgat cgccgtcgtg 120 

ggcatggcct gccgctaccc gggcggggtg tcctctccgg aggacctgtg gcggctggtc 180 

gagacgggca cggacgcgat cggcggcttc cccaccgacc gtggctggga cgtcgacgcc 240 

gtgtacgacc cggatcctga gtcgcggaac accacctact gccgggaggg cgggttcctg 300 

gccggggcag gagacttcga cgccgcgttc ttcggggtgt cgccgcacga ggccgtggtc 360 

atggaccccc agcagcggct gcttctggag gtgtcctggg aggcgctgga gcggtccggg 420 

accgacccgc acagcctgcg cggctcgcgc accggggtct acgtcggtgc ggcccaccag 480 

gggtacgcgg tcgacgccgg tcaggtgccg gagggcgcgg aggggttccg gctgaccggc 540 

agcgccgacg ccgtcctgtc cggacggatc tccftacctgc tcgggctgga gggtccggcc 600 

ctgaccgtcg agacggcctg ctcgtcctcg ctggtggcgg tgcacctcgc ggtgcaggcg 660 

, ctgcgccggg gcgagtgcgg gctggcactg gccggcgggg tcgccgtgat gcccgacccg 720 

gcggcattcg tggagttctc ccggcagcgg ggcctcgcgg cggacgggcg ctgccgggcg 780 

ttcggggcgg gcgcggacgg caccggctgg gcggagggcg tcggtgtgct ggtcctgcaa 840 

cggctctccg acgcggtgcg cgacggccgc tgggtgctgg gcgtgatccg gggttcggcc 900 

gtcaaccagg acggggccag caacgggctg accgccccga gcggccccgc ccagcagcgg 960 

gtcatccggc aggcgctgac cgacgcccgg ctcggcgccg accagatcga cgcggtcgag 1020 

gcgcacggca cgggcacccg gctcggcgac ccgatcgagg cgcaggcgct gatcgccgcc 1080 

tacggcgccg accggacccc ggaccggccg ctctggctcg gctcgttgaa gtcgaacatc 1140 

gggcacgccc aggcggcggc cggcgtcggc ggcctgatca agatgctcct ggcgatgcgg 1200 

gccgggacgc tcccacccac cctgcacgcc gacgtcccga ccccgctggt cgactggtcc 1260 
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gccggtgtcg tccggctgtc gaccggggtg gtgccctggc ccgcgttgcc cggggcgccc 1320 
cgcagggccg ggatctccgc gttcggggtg agcggcacca acgcgcacgt gatcgtcgag 1380 
cagccgccgc cggtcccggt cgacgacccg gcgccaccca cgaggaccct gccgctggtg 1440 
ccgtgggtgc tctccggccg gacggaggcg gcgctgcgcg cccaggcgga ccggttgcgt 1500 

acgcacctgg cggcgcaccc cgacgcggac ccgctggacg tgggattctc cctggccacc 1560 

agccgggccg cgctggagca ccgggccgtg cjbggtggccg ccgaccgcga cggcctgctc 1620 

cgcctcgtcg acgcgctggc cgccggcgag ccggcggcgg gcctgatccg gggcacggta 1680 

cgtcacgatc gccggaccgg gttcctcttc gccgggcagg gcggccagcg cgtcgggatg 1740 

gcgcgcgaac tgtacgaggc gttccccgcc ttcgccgacg ccctggacca gctcgccgcc 1800 

cggctggacc ggcacctcga tcgtccgctg ctgcgggtgc tgttcgccga gccggggtcg 1860 

gacgacgccc ggctgctcga cggcacccgg tacgcgcagg ccgccctctt cgccgtcgag ' 1920 

gtggcgttgt tccgactggt ccacggctgg ggggtccggc ccgacgtgct gctcggccac 1980 

tcggtgggcg agctggcggc cgcgcacgtg gccggcgtac tcgacgtgga cgacgcgtgc 2040 

gagctggtcg cggcgcgggg ccggctgatg ggggagctgc cgtcgggcgg cgcgatggtg 2100 

gcggtccggg ccaccgagga ggaggtcggg cccctgctcg acgggcagcg ggtcgcggtg 2160 

gcggcggtca acggcccgcg ctcggtcgtg gtctccggcg acgaggaggc ggtgctggcc 2220 

gtggccgccc ggtgcgccgc cctcggccac cggacgcgac gcctcaacgt cagccacgcg 2280 

ttccactccc cgcacgtgga ggcgatgctg gagccgttcc ggcgggtggc gcggggcctg 2340 

acgtaccatg ccccgacgat cccggtggtg tcgaacgcga cgggccggct cgccaccgcc 2400 

gacgcgctgc gcgaccccgg ttactgggtc cggcacgtcc gccagcccgt ccggttccgg 2460 

gacggggtgc gggccgcccg cgaccagggg gccaccgcct tcgtcgggct cggcccggac 2520 

ggggtgctgt gcgcgttggc cgaggagtgc ctcgggccca ccggcgacgt gctgctgctg 2580 

ccggtgctgc gccccggtcg gccggagccc gccaccctgc tggccgccct ggccggggcg ' 2640 

tacgccggcg gcgcggaaat ggactggtcc cgggtgttcg cgggcaccgg cgcgcgcagg 2700 

gtcgagctgc ccacgtacgc cttccagcac cggcgctact ggctggcgcc gggcccgccg 2760 

tcggcccgcc gcgacgacgc ctggcggtac cggatcgcct ggcggcccct gccgaccgtg 2820 

cccgccgccg ccgggaccga gacggtggcc ggggcgtggt tgctggtggt ccccgcccac 2880 

gacggcgtcg cgtcgctcgc cgacgccgcc gagcgggccg tgcaccgggg cggggccacg 2940 
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gtcacccggc 


tgacggtgga 


cgccgccgac gtggaccggg 


acaccctcgc 


cgccgtgctg 


3000 


accgaggccg 


ccgccgacgc 


ggacggcggg ccggacgggg 


tgctctgcct 


gctgggcctc 


3060 


gacgaccggg 


cacatccccg 


gtccgcctcg gtgccccgcg 


gggtgctggc 


gaccctgtcc 


3120 


ctcgcccagg 


ccctgaccga 


cctgggggcc tccgcgcggc 


tgtggtgcgt 


gacccggggg . 


3180 


gcggtcgccg 


tgacgcccgg 


cgagtccccg tcggtcgccg 


gagcccagtt 


gtggggcttc 


3240 


ggccgcgtgg 


ccgcgctcga 


actcccccgg tcctggggcg 


gcctggtgga 


cctgccggtc 


3300 


gacccggacg 


accgggactg 


ggacctgctg cggcgcgcgc 


tgcgcggccc 


ggaggaccag 


3360 


gtcgcggtcc 


ggggggcggt 


cgggtacgcc cggcggctgg 


tccccgcgcc 


cgcgccccgg 


3420 


gccgagcggg 


cctggcgtcc 


gcgcggcacg gtcctggtga 


ccggcggtac 


gggcgcgctc 


3480 


ggcgcgcaca 


cggcccgctg 


■gctggcgcgc aacggcgcca 


cgcacctcgt 


cctcaccagc 


3540 


cgccggggcg 


ggaacgcccc 


cggggtcgcc gcgctgcggg 


cggaactggt 


cacgctcggt 


3600 


gccgaggtga 


ccgtggtcgc 


ctgcgacgtc gccgaccggg 


aggccgtggc 


cggcctgctc 


3660 


gccgggattc 


cccgcgccgc 


tccgctcacc gccgtgttcc 


acgcggcggg 


cgtgccccag 


3720 


gtgacgccgc 


ttcacgagac 


gaccccggag ttgttcgcgc 


aggtctgcgc 


aggcaaggtc 


3780 


gccggggcgg 


tgcacctgca 


cgagttggcc ggtgacctgg 


acgccttcgt 


caccttcgcc 


3840 


tccgccgccg 


gggtgtgggg 


cagcggcggg cagtgcgcgt 


acgctgcggc 


caacgccgcc - 


3900 


ctcgacgcgc 


tcgccgagcg 


tcgtcgcgcc gcagggctgc 


ccgcgacctc 


cgtcgcctgg 


3960 


ggggtctggg 


gcgggcccgg 


catgggggcg ggcgcggggg 


aggagtacct 


gcgccgccgg 


4020 


ggcgtccggg 


cgatgccccc 


ggcagccgcc ctcgccgccc 


tcgggcggat 


cctggacgcc 


4080 


gacgagaccg 


gggtgacggt 


ctccgacacc gagtggggcc 


ggttcgcgtc 


cggcttcgcc 


4140 


gccgcgcgtc 


ccgccccgct 


gctcgccgag ctgccgggcg 


gggacgtcga 


tccggccggc 


4200 


ccggcgcacc 


gggcgcagcc 


gcccgtgccc cgaccggccc 


cggcagccac 


cgaccgcccc 


4260 


gggctgctgg 


cgctggtccg 


cgccgaggcc gccggggtgc 


tggggcacga 


cggtgccgac 


4320 


gacgttccgg 


ccgacgcgga 


gttctccgcc ctcggcttcg 


actcgctcgc 


cgccgtccag 


4380 


ctgcgccgcc 


ggctcgccga 


ggccaccggc ctgagcctct 


cggccccggt 


tctgttcgac 


4440 


caccgcaccc 


ctgacgcgct 


cgccgcgcac ctgcacggcc 


tgctcaccgg 


cgcggcgggc 


4500 


gggccacccg 


cgccggccgc 


cgggagcgcc ctggtcgaga 


tgtaccggcg 


ggccgtcgcc 


4560 


accggccgcg 


ccgccgaggc 


ggtggaggtg ctcggcaccg 


tcgccacgtt 


ccggccggtg • 


4620 


ttccggtccc 


cggacgaact 


gggcgagcca ccggccctcg 


tcccgctcgg 


caccggggcg 


4680 
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gggggacccg 


cgctggtctg 


ctgcgcgggc 


acggccgcgg cgtccggccc 


ccgcgagttc 


4740 


acggcgttcg 


ccgccgcgct 


ggccggtctc 


cgggacgtca ccgtccttcc 


gcagaccggc 


4800 


ttcctgcccg 


gcgagccgct 


gcccgccggg 


ctggacgtgc tgctcgacgc 


ccaggccgac 


4860 


gccgtcctgg 


cccactgcgc 


cgggggaccc 


ttcgtcctgg tcggccactc 


ggccggggcg 


4920 


aacatggcgc 


acgcgctgac 


ggtccgcctg 


gaggcgcggg gcgcggaccc 


cgccgcgctg 


4980 


gtgctgatgg 


acatctacac 


gcccgccgcc 


ccgggggcga tgggggtgtg 


gcgcgaggag 


5040 


atgctggcct 


gggtcgccga 


gcggtccgtc 


gtccccgtcg acgacacgcg 


gctgaccgcg 


5100 


atgggcgcct 


atcaccggct 


gctcctggac 


tgggcgcccc ggccgacccg 


ggcacccgtg 


5160 






0 ^ /—I /-« y t r^r-re^/^^ 

ac cggcgggc 


gccfcggccgg atccccggca 


ggactggcgt 


5220 


tcgcgcttcg 


acggcgcgca 


caccagcgcc 


gaggtgcccg gcacccactt 


ctcgatgatg 


5280 


accgagcacg 


cccccgtcac 


cgccgcgacc 


gtgcacaagt ggctcgacga 


ggtgtgcccg 


5340 


ccccgcgttc 


cgtga 








5355 



<210> 20 

<211> 464 

<212> PRT 

<213> micromonospora carbonacea subspecies aurantiaca 

<400> 20 

Val Thr Arg Thr Pro Gly Pro Ser Arg Arg Val Arg Arg Gin Gin Glu 
1 5 10- 15 

Arg Lys Arg Met lie Thr Val Pro Pro Asp Gly Asp Pro Ala Thr Trp 
20 25 30 

Ala Arg Arg Leu Gin Leu Thr Arg Ala Ala Gin Trp Phe Ala Gly Asn 
35 40 45 

His Gly Asp Pro Tyr Ala Leu lie Leu Arg Ala Glu Thr Asp Asp Pro 
50 55 60 

Thr Pro Tyr Glu Gin Arg Val Ala Ala Gin Pro Leu Phe Arg Ser Glu 
65 70 75 80 

Gin Leu Asp Thr Trp Val Thr Gly Asp Ala Ala Leu Ala Arg Glu Val 
85 90 -95 

Leu Thr Asp Asp Arg Phe Gly Trp Leu Thr Arg Ala Gly Gin Arg Pro 
100 105 110 

Ala Glu Arg Thr Leu Pro Leu Ala Gly Thr Ala Leu Asp His Gly Pro 
115 120 125 
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Glu Ala Arg Arg Arg Leu Asp Ala Leu Ala Gly Phe Gly Gly Pro Val 
130 135 140 . 

Leu Arg Ala Asp Ala Ala Gly Ala Arg Thr Arg Val Val Glu Thr Thr 
145 150 155 160 

Ala Val Leu Leu Asp Gly lie Gly Glu Arg Phe Asp Leu Ala Val Leu 
165 170 175 

Ala Arg Arg Leu Val Ala Ala Val Leu Ala Asp Leu Leu Gly Val Pro 
180 185 190 

Ala Ala Arg Arg Gly Arg Phe Ala Glu Ala Leu Ala Ala Ala Gly Arg 
195 200 205 

Thr Leu Asp Ser Arg Leu Cys Pro Gin Thr Val Ala Thr Ala Leu Ala 
210 215 220 

Thr Val Ala Ala Thr Ala Glu Leu Thr Asp Leu Leu Gly Glu Val Pro 
225 230 235 240 

Pro Pro Pro Ser Leu Ser Pro Ser Ala Ala Gly Ser Gly Pro Pro Arg 
245 250 255 

Pro Ser Ala Ala Gly Ser Trp Pro Pro Leu Pro Ala Asp Asp Arg Thr 
260 265 270 

Ala Ala Ala Leu Ala Leu Ala Val Gly Thr Ala Glu Pro Ala lie Thr 
275 280 285 

Leu Leu Cys Asn Ala Val Gly Ala Leu Leu Asp Arg Pro Gly Gin Trp 
290 295 300 

Ala Leu Leu Gly Gly Asp Leu Asp Arg Ser Ala Ala Val Val Glu Glu 
305 310 315 320 

Thr Leu Arg Cys Leu Pro Pro Val Arg Leu Glu Ser Arg Val Ala Gin 
325 330 335 

Gin Asp Val Thr Leu Gly Gly Gin Phe Leu Pro Ala Asp Ser His Leu 
340 345 350 

Val Val Leu Val Ala Met Ala Asn Arg Gly Pro Arg Ala Ala Thr Ala 
355 360 365 

Pro Ser Pro Asp Ala Phe Asp Pro Gly Gly Ser Arg Val Pro Ala Arg 
370 375 380 

Asp Val Val Gly Leu Pro Gin Leu Ala Gly Ala Gly Pro Leu lie Arg 
385 390 395 400 

Leu Val Val Thr Thr Ala Leu Arg Thr Leu Ala Glu Ala Leu Pro Thr 
405 410 415 



Leu Arg Arg Ala Ser Gly Gly Val Arg Trp Arg Arg Ser Pro Val Leu 
420 425 430 
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Leu Gly His Ala Arg Phe Pro Val Ala Arg Ala Glu Ser Gly Glu Gin 
435 440 445 

Arg Ser Asp Asp Arg Pro Ala Leu Glu Glu Ala lie Arg Cys Ala Ser 
■^50 455 460 

<210> 21 
<211> 1395 
<212> DNA 

<213> micromonospora carbonacea siibspecies aurantiaca 
<400> 21 

gtgacccgta cgccgggtcc gtcccggcga gtccgacgac agcaggagag gaagcgcatg ' 60 

atcacagtcc cgcccgacgg ggatcccgcg acctgggccc gccggctgca actgacccgc 120 

gccgcgcagt ggttcgccgg caaccacggc gacccgtacg cgctgatcct gcgcgcggag 180 

accgacgacc cgaccccgta cgagcagcgg gtggccgccc agccgctgtt ccgcagcgag 240 

cagttggaca cctgggtgac cggggacgcc gcgctggccc gggaggtgtt gaccgacgac 300 

cggttcggct ggctgacccg ggctgggcag cggcccgccg agcggaccct gccgctggcc 360 

ggcacggcac tggaccacgg gccggaggcc cggcgtcggc tggacgcgct cgccgggttc 420 

ggcgggccgg tcctgcgggc cgacgccgca ggggcgcgta cccgggtcgt ggagaccacc 480 

gcggtcctgc tcgacgggat cggggagcgg ttcgacctgg ccgtgctcgc ccggcggctg 540 

gtcgctgcgg tgctggccga cctgctgggg gtgcccgccg cgcggcgggg ccgcttcgcc 600 

gaggcactcg ccgccgccgg ccgtacgctg gacagccggc tgtgcccgca gaccgtggcg 660 

accgctctcg ccaccgtcgc cgccaccgcc gagctgaccg acctgctggg cgaggtgccg 720 

cccccgccgt cgctgtcccc gtccgccgcc ggctccgggc cgccgcgtcc gtcogcagcc ' 780 

ggttcctggc cgccgctgcc ggctgacgac cggacggccg ccgcgctcgc gctggcggtc 840 

ggcacggccg aaccggogat cacoctgctc tgcaacgcgg tcggtgcgct gctcgaccgc 900 

cccgggcagt gggccctgct cggtggggac ctcgaccggt ccgccgccgt cgtcgaggag 960 

accctgcgct gccttccgcc ggtgcgcctg gagagccgcg tcgcgcagca ggacgtcacc 1020 

ctgggogggc agttcctcco ggcggacagc caoctggtcg tgctggtcgc catggcgaac 1080 

cggggtccgc gcgcggcgac cgccccgagc ccggacgcgt tcgaccotgg cgggtcgegc 1140 

gtcccggccc gcgacgtggt gggcctgccg cagcttgccg gcgccgggcc gctgatcaga 1200 

ctcgtcgtca cgaccgccct gcggaccctc gccgaggcgc tgcccacgct gcggcgggcg 1260 

tccggcggcg tccggtggcg acgctcgccc gtcctgctcg gccacgcccg ctttcccgtc 1320 

gcacgggcgg agagcggcga acagcggtcc gacgaccgcc cggcgctgga ggaggcgatc 1380 
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cgatgcgcgt cctga 1395 



<210> 22 
<211> 429 
<212> PRT 

<213> micromonospora carbonacea subspecies axirantiaca 
<400> 22 

Met Thr Ser Phe Ala His Asn Thr His Tyr Tyr Ser Leu Val Pro Leu 

1.5 10 15 

Ala Trp Ala Leu Arg Ala Ala Gly His Glu Val Arg Val Ala Ser Gin 
20 25 • 30 

Pro Ser Leu Thr Asp Thr lie Val Arg . Ser Gly Leu Thr Ala Val Pro 
35 40 45 

Val Gly Asp Asp Gin Ala lie lie Asp Leu Leu Ala .Glu Val Gly Gly 
50 55 60 

Asp Leu Val Pro Tyr Gin Arg Gly Leu Asp Phe Thr Glu Ala Arg Pro 
65 70 75 80 

Glu Val Leu Thr Trp Glu Tyr Leu Leu Gly Gin Gin Thr Met Leu Thr 
85 90 95 

Ala Leu Cys Phe Ala Pro Leu Asn Gly Val Ser Thr Met Asp Asp Met 
100 105 110 

Val Ala Leu Ala Arg Ser Trp Gin Pro Glu Leu Val lie Trp Glu Pro 
115 120 125 

Phe Thr Tyr Ala Gly Pro Val Ala Ala Arg Val Val Gly Ala Thr His 
130 135 140 

Ala Arg Leu Leu Trp Gly Pro Asp Val Val Gly Asn Ala Arg Arg Leu 
145 150 155 160 

Phe Thr Glu Ser Leu Ala Arg Gin Pro Asp Glu Gin Arg Glu Asp Pro 
165 170 175 

Met Ala Glu Trp Leu Arg Cys Thr Leu His Arg Tyr Gly Cys Glu Leu 
180 185 190 

Gly Asp Asp Glu Val Glu Thr Leu Val Thr Gly Gly Trp Thr lie Asp 
195 200 205 

Pro Thr Ala Asp Ser Thr Arg Leu Pro Val Pro Gly Arg Arg Val Ala 
210 215 220 

Met Arg Tyr Thr Pro Tyr Asn Ser Pro Ser Val Val Pro Glu Trp Val 
225 230 235 240 

Ala Lys Ala Asp Arg Pro Arg Val Cys Leu Thr Leu Gly Val Ser Ser 
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245 250 



255 



Arg Glu Thr Tyr Gly Arg Asp Val Val Ser Phe Gin Glu Leu Leu Gly 
260 265 270 

Ala Leu Gly Asp Leu Asp Val Glu Val Val Ala Thr Leu Ser Asp Ala 

280 285 

Gin Arg Glu Asp Leu Gly Asp Leu Pro Asp Asn Val Arg Val Cys Asp 

295 300 

Phe Val Pro Leu Asp Val Leu Leu Pro Thr Cys Ala Ala He He His 

310 315 320 

His Gly Gly Ala Gly Thr Trp Ser Thr Ala Met Leu Tyr Gly Val Pro 
325 330 335 

Gin He Met He Ala Ser Leu Trp Asp Ala Pro Leu Lys Ala Gin Gin 
340 345 350 

Ala Glu Arg Leu Gly Thr Gly He Ser He Pro Pro Glu Arg Leu Asp 
355 360 365 

Ala Pro Thr Leu Arg Ala Ala Val Val Arg He Leu Asp Asp Pro Ser 
370 375 380 

He Ala Ala Ala Ala Arg Arg Gin Arg Asp Glu Leu Arg Ala Ala Pro 



385 390 



400 



Ser Pro Ala Glu Val Val Arg He Leu Glu Arg Leu Val Ala Asp Asp 
405 410 415 

Arg Pro Gly Arg Pro Ala Gly Thr Ala Thr Asp His Ser 
420 425 

<210> 23 
. <211> 1290 
<212> DNA 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 23 

atgacgtcct tcgcgcacaa cacccactac tacagcctgg tgccgttggc ctgggcgctg 60 
cgcgcggccg gccacgaggt acgggtggcg agccagccct cgctcaccga caccatcgtg 120 
cggtcggggc tgaccgcggt gccggtcggc gacgaccagg cgatcatcga cctgctcgcc ISO 
gaggtcggcg gcgacctggt gccgtaccag cggggactgg acttcacoga ggcccgtccc 240 
gaagtgctga cctgggagta tctgctcggg cagcagacca tgctcaccgc gctgtgcttc 300 
gcgccgctca acggcgtctc cacgatggac gacatggtcg ccctggcccg gtcctggcag 360 
cccgagctgg tgatctggga gccgttcacc tacgccgggc cggtcgcggc gcgggtcgtc 420 
ggtgcgacgc acgcccggct gctctggggg ccggacgtgg tcggcaacgc ccggcggctg 480 
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ttcaccgaga 


gcctggcgcg 


gcagccggat 


gagcagcgcg 


aggacccgat 


ggccgagtgg 


c A n 


ttgcgctgca 


ccctgcaccg 


gtacggctgc 


gagctcggcg 


acgacgaggt 


ggagaccctg 


f n A 


gtcaccggcg 


ggtggaccat 


cgatcccacc 


gccgacagca 


cccggcttcc 


cgtccccggg 


<J ^ o 


cgtcgggtgg 


ccatgcggta 


caccccgtac 


aacagcccgt 


ccgtggtigcc 


ggagtgggtg 


TOO 


gccaaggccg 


accggccccg 


cgtct.gcct.c 


acccbcggcg 


t.gt.cgagccg 


ggagacguac 


/ O U 


ggcagggacg 


tggtctcctt 


ccaggagctg 


ctcggcgccc 


t egg cgac c t 


ggacgucgag 




gtcgtcgcga 


cgctcagcga 


cgcccagcgc 


gaggacctgg 


gtgacctgcc 


ggacaacgtc 


yuu 


cgggtgtgcg 


acttcgtgcc 


gctggacgtg 


ctgctgccga 


cctgtgccgc 


gatcatccac 




cacggcgggg 


cgggcacgtg 


gtcgacggcc 


atgctctacg 


gggtgccgca 


gatcatgatc 




gcgtcgctgt 


gggacgcccc 


gctcaaggcg 


cagcaggcgg 


agcgactcgg 


c acgggga u c 




tcgatcccgc 


cggagcggct 


cgacgccccg 


acgctgcggg 


cggc eg ucy u 


ccggauccuc 




gacgacccgt 


cgatcgccgc 


cgccgcccgc 


cgtcagcgcg 


acgagctgcg 


tgccgcgccg 


1200 


tcgccggccg 


aggtggtccg 


catcctggaa 


cgcctcgtcg 


cggacgaccg 


gcccggccgg 


1260 


ccggccggaa 


ccgccaccga 


ccactcctga 








1290 


<210> 24 
<211> 240 
<212> PRT 

<2 13 > micromonospora 


carbonacea 


subspecies 


aurantiaca 







<400> 24 

Met Ser Met Met Tyx Ala Asp Ala He Ala Glu Val Tyr Asp Leu He 
IS 10 15 

Tyr Gin Gly Lys Gly Lys Asp Tyr Ala Ala Glu Ala Ala Glu Leu Glu 
20 25 30 

Ala Leu Ala Arg Ala Arg Arg Pro His Ala Arg Thr Leu Leu Asp Val 
35 40 45 

Ala Cys Gly Thr Gly Leu His Leu Arg His Leu Ala Gly Leu Phe Asp 
50 55 60 

Asp Val Gly Gly He Glu Leu Ala Pro Asp Met Leu Ser He Ala Gin 
65 70 75 80 

Gin Arg Asn Pro Gly Ala Ala Leu His Leu Gly Asp Met Arg Thr Phe 
85 90 95 

Asp Leu Gly His Arg Tyr Asp Val He Thr Cys Met Phe Ser Ser Val 
100 105 110 
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Gly His Leu Ala Thr Thr Ala Glu Leu Asp Ala Thr Leu Ala Arg Phe 
3-15 120 125 

Ala Ala His Leu Ser Pro Gly Gly Val Ala He Val Glu Pro Trp Tro 
130 • 135 140 

Phe Pro Glu Thr Phe Thr Pro Gly Tyr Val Gly Ala Ser Leu Val Glu 
"5 150 155 160 

Val Asp Gly Arg Thr He Ser Arg Val Ser His Ser Val Arg Glu Gly 
165 170 175 

Gly Ala Thr Arg He Thr Val His Tyr Leu Val Ala Ser Pro Gly Gly 
180 185 190 

Gly Val Arg His Phe Asp Glu Ser His Leu He Thr Leu Phe Glu Arg 
195 200 205 

Ser- Asp Tyr Glu Arg Ala Phe Ala Arg Ala Gly Phe Thr Thr Glu Tyr 
210 215 220 

Leu Thr Pro Gly Pro Ser Gly Arg Gly Leu Phe Val Gly Val His Pro 
225 230 235 240 

<210> 25 

<211> 723 
<212> DNA 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 25 

atgtccatga tgtacgcgga cgccatcgcc gaggtctacg acctgatcta ccagggcaag 60 

ggcaaggact acgcggcgga ggcggcggag ctggaggcgc tggcccgggc ccgtcggccg 120 

cacgcccgga cgctgctgga cgtggcgtgc ggcacggggc tgcacctgcg gcacctggcg 180 

gggctcttcg acgacgtggg cggcatcgag ctggcaccgg acatgctgag catcgcccag 240 

cagcgaaacc ccggggcggc cctgcacctc. ggcgacatgc ggaccttcga cctggggcac 300 

cgctacgacg tcatcacctg catgttcagt tcggtgggcc acctggccac cacggccgag 360 

ctggacgcga cgttggcccg gttcgccgcg cacctgtccc ccgggggagt ggcgatcgtc 420 

gagccgtggt ggttcccgga gaccttcacc cccgggtacg tgggcgcgag cctggtggag 480 

gtcgacggcc gtaccatctc gcgggtctcc cattcggtgc gcgagggcgg cgcgacccgg 540 

atcaccgtgc actacctcgt ggccagcccc ggcgggggag tccggcactt cgacgagagc 600 

cacctgatca ccctcttcga acggtccgac tacgaacgtg ccttcgcccg ggcgggtttc 660 

acgacggagt acctgacgcc cggcccgtcc ggccgcggtc tgttcgtcgg cgtccacccc 720 

723 
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<210> 26 

<211> 1811 

<212> PRT 

<213> micromonospora carbonacea subspecies aurantiaca 

<400> 26 

Met Pro Asp Thr Pro Glu Leu Asn Arg lie Leu Asp Ala lie Leu Ala 
1 5 10 15 

Gin Glu Thr Asp Ala Arg Glu Leu Ala Ala Leu Pro Leu Pro Ser Ser 
20 25 30 

Tyr Arg Ala Val Thr Val His Lys Asp Glu Thr Gly Met Phe Leu Gly 
35 40 45 

Leu Pro Arg Gin Glu Lys Asp Pro Arg Lys Ser Leu His Thr Glu Glu 
50 55 60 

Val Pro Val Pro Glu Leu Gly Pro Gly Glu Ala Leu Val Ala Val Leu 
65 70 75 80 

Ala Ser Ser Val Asn Tyr Asn Thr Val Trp Ser Ser Leu Phe Glu Pro 
85 90 95 



Leu Pro Thr Phe Gly Phe Leu Glu Arg Tyr Gly Arg Leu Ser Glu Leu 
100 105 110 

Ala Arg Arg His Asp Leu Pro Tyr His lie Leu Gly Ser Asp Leu Ala 
115 120 125 

Gly Val Val Leu Arg Val Gly Pro Gly Val Asn Arg Trp Arg Pro Gly 
130 135 140 

Asp Glu Val Val Ala His Cys Leu Ser Val Glu Leu Glu Ser Ala Asp 
145 150 155 160 

Gly His Gly Asp Thr Met Leu Asp Pro Glu Gin Arg lie Trp Gly Phe 
165 170 175 

Glu Thr Asn Phe Gly Gly Leu Ala Glu He Ala Leu Val Lys Ala Asn 
180 185 190 

Gin Leu Met Pro Lys Pro Asp His Leu Thr Trp Glu Glu Ala Ala Ala 
195 200 205 



Pro Gly Leu Val Asn Ser Thr Ala Tyr Arg Gin Leu Val Ser Gly Asn 
210 215 220 

Gly Ala Arg Met Lys Gin Gly Asp Asn Val Leu Val Trp Gly Ala Ser 
225 230 235 240 

Gly Gly Leu Gly Ala Phe Ala Thr Gin Leu Val Leu Ala Gly Gly Ala 

245 250 255 

Asn Pro Val Cys Val Val Ser Ser Pro Arg Lys Ala Asp He Cys Arg 
260 265 270 
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Arg Met Gly Ala Glu Ala Val He Asp Arg Val Ala Glu Asp Tyr Arg 
275 280 285 

Phe Trp Ser Asp Glu Arg Thr Gin Asn Pro Arg Glu Trp Lys Arg Phe 
290 295 300 

Gly Ala Arg He Arg Glu Leu Thr Gly Gly Glu Asp Val Asp He Val 
305 310 315 320 

Phe Glu His Pro Gly Arg Glu Thr Phe Gly Ala Ser Val Tyr Val Thr 
325 330 335 

Arg Lys Gly Gly Thr Val Val Thr Cys Ala Ser Thr Ser Gly Phe Glu 
340 345 350 

His Val Tyr Asp Asn Arg Tyr Leu Trp Met Ser Leu Lys Arg He Val 
355 360 365 

Gly Thr His Phe Ala Asn Tyr Arg Glu Ala Trp Glu Ala Asn Arg Leu 
370 375 380 

Val Val Lys Gly Lys He His Pro Thr Leu Ser Arg CyB Tyr Pro Leu 
385 390 ' 395 400 

Glu Glu Va,l Gly Gin Ala Val Tyr Asp Val His His Asn Leu His Gin 
405 410 415 

Gly Lys Val Gly Val Leu Ala Leu Ala Pro Arg Glu Gly Leu Gly Val 
420 425 430 

Arg Asn Pro Glu Leu Arg Glu Cys His Leu Ala Ala He Asn Arg Phe 
435 440 445 

Arg Val Pro Ala Ala Thr Gly Cys Cys" Ala Gly Ala Cys Ala Cys Cys 
450 455 460 

Cys Cys Cys Gly Ala Gly Cys Thr Gly Ala Ala Cys Cys Gly Gly Ala 
465 470 475 480 

Thr Ala Cys Thr Cys Gly Ala Cys Gly Cys Gly Ala Thr Cys Cys Thr 
485 490 495 

Cys Gly Cys Cys Cys Ala Gly Gly Ala Gly Ala Cys Cys Gly Ala Cys 
500 505 510 

Gly Cys Gly Cys Gly Gly Gly Ala Gly Cys Thr Gly Gly Cys Gly Gly 
515 520 525 

Cys Cys Cys Thr Gly Cys Cys Gly Cys Thr Gly Cys Cys Cys Thr Cys 
530 535 540 



Cys Thr Cys Cys Thr Ala Cys Cys Gly Gly Gly Cys Cys Gly Thr Gly 
545 550 555 560 

Ala Cys Gly Gly Thr Gly Cys Ala Cys Ala Ala Gly Gly Ala Cys Gly 
565 570* 575 
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Ala Gly Ala Cys Gly Gly Gly Gly Ala Thr Gly Thr Thr Cys Cys Thr 
580 585 590 

Gly Gly Gly Cys Cys Thr Thr Cys Cys Cys Cys Gly Cys Cys Ala Gly 
595 600 605 

Gly Ala Gly Ala Ala Gly Gly Ala Cys Cys Cys Gly Cys Gly Cys Ala 
610 615 620 

Ala Gly Thr Cys Gly Cys Thr Gly Cys Ala Cys Ala Cys Gly Gly Ala 
625 630 635 640 

Gly Gly Ala Gly Gly Thr Gly Cys Cys Gly Gly Thr Gly Cys Cys Cys 
645 650 655 

Gly Ala Gly Cys Thr Gly Gly Gly Cys Cys Cys Cys Gly Gly Gly Gly 
660 665 670 

Ala Gly Gly Cys Cys Cys Thr Cys Gly Thr Cys Gly Cys Gly Gly Thr 
675 680 685 

Cys Cys Thr Gly Gly Cys Cys Ala Gly Cys Thr Cys Gly Gly Thr Cys 
690 695 700 

Ala Ala Cys Thr Ala Cys Ala Ala Cys Ala Cys Gly Gly Thr Cys Thr 
705 710 715 720 

Gly Gly Thr Cys Gly Thr Cys Gly Thr Thr Gly Thr Thr Cys Gly Ala 
725 730 735 

Gly Cys Cys Gly Cys Thr Gly Cys Cys Cys Ala Cys Cys Thr Thr Cys 
740 745 750 

Gly Gly Cys Thr Thr Cys Cys Thr Gly Gly Ala Gly Cys Gly Cys Thr 
755 760 765 

Ala Cys Gly Gly Cys Cys Gly Gly Cys Thr Cys Thr Cys Cys Gly Ala 
770 775 780 

Gly Cys Thr Gly Gly Cys Cys Cys Gly Gly Cys Gly Gly Cys Ala Cys 
785 790 795 800 

Gly Ala Cys Cys Thr Gly Cys Cys Gly Thr Ala Cys Cys Ala Cys Ala 
805 810 815 

Thr Cys Cys Thr Cys Gly Gly Cys Thr Cys Gly Gly Ala Cys Cys Thr 
820 825 830 

Gly Gly Cys Cys Gly Gly Cys Gly Thr Gly Gly Thr Gly Cys Thr Gly 
835 840 845 

Ala Gly Gly Gly Thr Cys Gly Gly Gly Cys Cys Cys Gly Gly Cys Gly 
850 855 860 

Thr Cys Ala Ala Cys Cys Gly Cys Thr Gly Gly Cys Gly Gly Cys Cys 
865 870 875 880 
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Gly Gly Gly Thr Gly Al.a Cys Gly Ala Gly Gly Thr Cys Gly Thr Gly 
885 890 895 

Gly Cys Gly Cys Ala Cys Thr Gly Cys Cys Thr Cys Thr Cys Gly Gly 
900 905 910 

Thr Gly Gly Ala Gly Cys Thr Gly Gly Ala Gly Thr Cys Cys Gly Cys 
915 920 925 

Cys Gly Ala Cys Gly Gly Cys Cys Ala Cys Gly Gly Cys Gly Ala Cys 

930 935 940 

I 

Ala Cys Cys Ala Thr Gly Cys Thr Cys Gly Ala Cys Cys Cys Gly Gly 
945 950 955 960 

Ala Ala Cys Ala Gly Cys Gly Gly Ala Thr Cys Thr Gly Gly Gly Gly 
965 970 975 

Cys Thr Thr Cys Gly Ala Gly Ala Cys Cys Ala Ala Cys Thr Thr Cys 
980 985 990 

Gly Gly Cys Gly Gly Cys Cys Thr Cys Gly Cys Cys Gly Ala Gly Ala 
■995 • 1000 1005 

Thr Cys Gly Cys Gly Thr Thr Gly Gly Thr Cys Ala Ala Gly Gly 
1010 1015 1020 

Cys Gly Ala Ala Cys Cys Ala Gly Cys Thr Gly Ala Thr Gly Cys 
1025 1030 1035 

Cys Cys Ala Ala Ala Cys Cys Cys Gly Ala Cys Cys Ala Cys Cys 
1040 1045 1050 

Thr Gly Ala Cys Cys Thr Gly Gly Gly Ala Gly Gly Ala Gly Gly 
1055 1060 1065 

Cys Cys Gly Cys Cys Gly Cys Gly Cys Cys Gly Gly Gly Ala Cys 
1070 1075 1080 

Thr Gly Gly Thr Cys Ala Ala Cys Thr Cys Cys Ala Cys Cys Gly 
1085 1090 1095 

Cys Cys Thr Ala Cys Cys Gly Cys Cys Ala Gly Cys Thr Gly Gly 
1100 1105 1110 

Thr Cys Thr Cys Cys Gly Gly Cys Ala Ala Cys Gly Gly Gly Gly 
1115 1120 1125 

Cys Cys Cys Gly Gly TQa Thr Gly Ala Ala Gly Cys Ala Gly Gly 
1130 1135 1140 

Gly Cys Gly Ala Cys Ala Ala Cys Gly Thr Cys Cys Thr Cys Gly 
1145 1150 1155 

Thr Cys Thr Gly Gly Gly Gly Gly Gly Cys Cys Ala Gly Cys Gly 
1160 1165 1170 
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Gly Cys Gly Gly Thr Cys Thr Cys Gly Gly Cys Gly Cys Gly Thr 
1175 1180 1185 

Thr Cys Gly Cys Cys Ala Cys Cys Cys Ala Gly Cys Thr Cys Gly 
1190 1195 1200 

Thr Gly Cys Thr Gly Gly Cys Cys Gly Gly Cys Gly Gly Gly Gly 
1205 1210 1215 

Cys Cys Ala Ala Thr Cys Cys Cys Gly Thr Cys Thr Gly Cys Gly 
1220 1225 1230 

Thr Gly Gly Thr Cys Thr Cys Cys Ala Gly Cys Cys Cys Gly Cys 
1235 1240 1245 

Gly Cys Ala Ala Gly Gly Cys Cys Gly Ala Cys Ala Thr Cys Thr 
1250 1255 1260 

Gly Cys Cys Gly Thr Cys Gly Gly Ala Thr Gly Gly Gly Cys Gly 
1265 1270 1275 

Cys Cys Gly Ala Gly Gly Cys Cys Gly Thr Cys Ala Thr Cys Gly 
1280 . 1285 1290 

Ala Cys Cys Gly Gly Gly Thr Cys Gly Cys Cys Gly Ala Gly Gly 
1295 1300 1305 

Ala Cys Thr Ala Cys Cys Gly Cys Thr Thr Cys Thr Gly Gly Thr 
1310 1315 1320 

Cys Cys Gly Ala Cys Gly Ala Gly Cys Gly Cys Ala Cys Cys Cys 
1325 1330 1335 

Ala Gly Ala Ala Thr Cys Cys Cys Cys Gly Gly Gly Ala Gly Thr 
1340 1345 1350 

Gly Gly Ala Ala Gly Cys Gly Cys Thr Thr Cys Gly Gly Cys Gly 
1355 1360 1365 

Cys Ala Cys Gly Cys Ala Thr Thr Cys Gly Gly Gly Ala Gly Cys 
1370 1375 1380 

Thr Gly Ala Cys Cys Gly Gly Ala Gly Gly Cys Gly Ala Gly Gly 
1385 1390 1395 

Ala Cys Gly Thr Cys Gly Ala Cys Ala Thr Cys Gly Thr Cys Thr 
1400 1405 1410 

Thr Cys Gly Ala Gly Cys Ala Cys Cys Cys Cys Gly Gly Cys Cys 
1415 1420 1425 

Gly Gly Gly Ala Gly Ala Cys Gly Thr Thr Cys Gly Gly Cys Gly 
1430 1435 1440 

Cys Cys Thr Cys Gly Gly Thr Cys Thr Ala Cys Gly Thr Gly Ala 
1445 1450 1455 
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Cys Cys Cys Gly Cys Ala Ala Ala Gly Gly Ala Gly Gly Cys Ala 
1460 1465 1470 

Cys Cys Gly Thr Gly Gly Thr Cys Ala Cys Cys Thr Gly Cys Gly 
1475 1480 1485 

Cys Cys Thr Cys Gly Ala Cys Gly Ala Gly Cys Gly Gly Thr Thr 
1490 1495 1500 

Thr Cys Gly Ala Gly Cys Ala Cys Gly Thr Cys Thr Ala Cys Gly * 
1505 1510 1515 

Ala Cys Ala Ala Cys Cys Gly Thr Thr Ala Cys Cys Thr Gly Thr 
1520 1525 1530 

Gly Gly Ala Thr Gly Thr Cys Cys Cys Thr Gly Ala Ala Gly Cys 
1535 1540 1545 

Gly Cys Ala Thr Cys Gly Thr Cys Gly Gly Cys Ala Cys Gly Cys 
•1550 1555 1560 

Ala Cys Thr Thr Cys Gly Cys Cys Ala Ala Thr Thr Ala Cys Cys 
1565 1570 1575 

Gly Gly Gly Ala Gly Gly Cys Gly Thr Gly Gly Gly Ala Ala Gly 
1580 1585 1590 

Cys Cys Ala Ala Cys Cys Gly Gly Thr Thr Gly Gly Thr Gly Gly 
1595 1600 1605 

Thr Cys Ala Ala Gly Gly Gly Cys Ala Ala Gly Ala Thr Cys Cys 
1610 1615 1620 

Ala Cys Cys Cys Gly Ala Cys Gly Cys Thr Gly Thr Cys Gly Cys 
1625 1630 1635 

Gly Cys Thr Gly Cys Thr Ala Cys Cys Cys Gly Cys Thr Gly Gly 
1640 1645 1650 

Ala Gly Gly Ala Gly Gly Thr Cys Gly Gly Cys Cys Ala Gly Gly 
1655 1660 1665 

Cys Gly Gly Thr Cys Thr Ala Cys Gly Ala Cys Gly Thr Cys Cys 
1670 1675 1680 

Ala Thr Cys Ala Cys Ala Ala Cys Cys Thr Gly Cys Ala Cys Cys 
1685 1690 1695 

Ala Gly Gly Gly Cys Ala Ala Gly Gly Thr Cys Gly Gly Cys Gly 
1700 1705 1710 

Thr Gly Cys Thr Cys Gly Cys Gly. Cys Thr Cys Gly Cys Gly Cys 
1715 1720 1725 

Cys Gly Cys Gly Cys Gly Ala Gly Gly Gly Gly Cys Thr Cys Gly 
1730 1735 1740 
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Gly Gly Gly Thr Cys Cys Gly Gly Ala Ala Cys Cys Cys Gly Gly 
1745 1750 1755 

Ala Gly Cys Thr Gly Cys Gly Gly Gly Ala Ala Thr Gly Cys Cys 
1760' 1765 1770 

Ala Thr Cys Thr Thr Gly Cys Cys Gly Cys Gly Ala Thr Cys Ala 
1775 1780 1785 

Ala Cys Cys Gly Cys Thr Thr Cys Cys Gly Gly Gly Thr Gly Cys 
1790 1795 1800 

Cys Gly Gly Cys Cys Thr Gly Ala 
1805 1810 

<210> 27 
<211> 1359 
<212> DHA 

<213> micromonospora carbonacea siibspecies aurantiaca 
<400> 27 

atgccagaca cccccgagct gaaccggata ctcgacgcga tcctcgccca ggagaccgac 
•gcgcgggagc tggcggccct gccgctgccc tcctcctacc gggccgtgac ggtgcacaag 
gacgagacgg gg^tgttcct gggccttccc cgccaggaga aggacccgcg caagtcgctg 
cacacggagg aggtgccggt gcccgagctg ggccccgggg aggccctcgt cgcggtcctg 
gccagctcgg tcaactacaa cacggtctgg tcgtcgttgt tcgagccgct gcccaccttc 
ggcttcctgg agcgctacgg ccggctctcc gagctggccc ggcggcacga cctgccgtac 
cacatcctcg gctcggacct ggccggcgtg gtgctgaggg tcgggcccgg cgtcaaccgc 
tggcggccgg gtgacgaggt cgtggcgcac tgcctctcgg tggagctgga gtccgccgac 
ggccacggcg acaccatgct cgacccggaa cagcggatct ggggcttcga gaccaacttc 
ggcggcctcg ccgagatcgc gttggtcaag gcgaaccagc tgatgcccaa acccgaccac 
ctgacctggg aggaggccgc cgcgccggga ctggtcaact ccaccgccta ccgccagctg 
gtctccggca acggggcccg gatgaagcag ggcgacaacg tcctcgtctg gggggccagc 
ggcggtctcg gcgcgttcgc cacccagctc gtgctggccg gcggggccaa tcccgtctgc 
gtggtctcca gcccgcgcaa ggccgacatc tgccgtcgga tgggcgccga ggccgtcatc 
gaccgggtcg ccgaggacta ccgcttctgg tccgacgagc gcacccagaa tccccgggag 
tggaagcgct tcggcgcacg cattcgggag ctgaccggag gcgaggacgt cgacatcgtc 
ttcgagcacc ccggccggga gacgttcggc gcctcggtct acgtgacccg caaaggaggc 
accgtggtca cctgcgcctc gacgagcggt ttcgagcacg tctacgacaa ccgttacctg 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
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tggatgtccc 


tgaagcgcat 


cgtcggcacg 


cacttcgcca attaccggga ggcgtgggaa 


1140 


gccaa.ccggt 


tggtggtcaa 


gggcaagatc 


cacccgacgc tgtcgcgctg ctacccgctg 


1200 


gaggaggtcg 


gccaggcggt 


ctacgacgtc 


catcacaacc tgcaccaggg caaggtcggc 


1260 


gtgctcgcgc 


tcgcgccgcg 


cgaggggctc 


ggggtccgga acccggagct gcgggaatgc 


1320 


catcttgccg 


cgatcaaccg 


cttccgggtg 


ccggcctga 


1359 


<210> 28 

<211> 636 
<212> PRT 

< 2 1 3 > mi cronionospora 


carbonacea 


subspecies aurantiaca 





<400> 28 

Val His Gin Ala His Arg Asp Gly Val Asp Gin Ala Thr Leu Asp Arg 
15 10 15 

Val Met He Ala Lys Arg Leu Ala Leu Glu Leu Arg Glu Val He Gly 
20 25 30 

Arg Arg Cys Gin Arg Gin Ala Glu Leu Ala Ala Leu Val Asp Thr Ala 
35 40 ' 45 

Arg Asp Leu Ala Gly Ala Thr Asn Leu Glu Ala Gly Leu Gin Leu Val 
50 55 60 

Val Arg Arg Thr Gin Leu Leu Leu Ala Gly Asp Val Ala Phe Val Ser 
^5 70 75 80 

Leu Val Asp Asp Ala Thr Gly Glu Ser Tyr Val Ala Ser Ala Val Gly 
85 90 95 

Ala Ala Thr Ala Leu Thr Ser Gly Tyr Arg Leu Pro Trp Arg Asp Gly 
100 105 110 

Leu Val Val Ala Ala Ala Pro Arg Glu Pro Leu Ser Trp Thr Ala Asp 
115 120 125 

His Leu Ala Asp Glu Arg Leu Glu Arg His Pro Ala Ala Asp Gly Leu 
130 135 140 

Val Arg Ala Glu Gly Leu His Ala Val Leu Ser Val Val Leu Ser Val 

150 155 160 

Glu Gly Arg His Leu Gly Asn Leu His Val Gly His Arg Gin Val Arg 
165 170 175 

His Phe Ala Pro Asp Glu Val Ala Ser Leu Arg Leu Leu Ala Asp Leu 
180 185 190 

Ala Ala Thr Ala Val Glu Arg He Met Leu Leu Asp Asp Thr Trp Ala 
195 200 . 205 
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Glu Leu Lys Gin Ala Gin Gin Glu Ala Ala Arg Ala Arg Ala Glu Leu 
210 215 220 

Asn Ala Val Arg Met Ala Asp Arg Leu Gin Pro Glu Leu Val Gin Leu 
225 230 235 240 

lie Leu Asp Gly Gly Glu Leu Asp Asp Leu Val Gly Ser Ala Val Arg 
245 250 255 

Arg Leu Gly Gly Ala Leu His Val Arg Asp Arg Ala Asn Gly Val Leu 
260 265 270 

Ala Ala Ala Gly Glu He Pro Val Pro Asn Glu Arg Glu Leu Ala Arg 

275 280 285 

Val Arg Leu Asn Ala His Ala Thr Gly Arg Pro Gly Arg Leu Thr Thr 
290 295 300 

Gly Ser Trp Val Val Pro Leu Ala Ala Arg Ala Gly Asp Leu Gly Cys 
305 310 315 320 

Val Leu Phe His Ala Asp Glu Pro Ser Asp Asp Glu Arg Met Ala Ala 
325 330 335 

Leu Pro Ala Val Ala Gin Thr Val Ala Leu Leu Met Thr Arg Asn Gly 
340 345 350 

Gly Ser His Gly Gin Pro Gly Asp .Gly Leu Leu Glu Asp Leu Leu Gly 
355 360 365 

Pro Trp Pro Asp Leu Glu Arg Gly Gly Lys Arg Arg Arg Tyr Thr Pro 

370. 375 380 

Val Glu Phe Asp Arg Pro Tyr Val Val Val Val Ala Arg Pro Glu Gly 
385 390 395 , 400 

Ala Thr Ser Pro Arg Val Phe Glu Arg Ala Val Ser Val Ala His Gly 
405 410 415 

Leu Asn Gly Met Lys Ala lie Arg Asp Gly Gin Ala Val Leu Leu Leu 
420 425 430 

Pro Gly Asp Asp Pro Gly Ala Arg Ala Arg Asp Val Thr Arg Glu Leu 
435 440 445 

Ser Gly Leu Leu Gly Leu Pro Val Thr' Ala Gly Gly Ala Gly Pro Val 
450 455 460 

Arg Thr Ala Asp Ser Val Ser Arg Thr Tyr Gin Glu Ala Ala Arg Cys 
465 470 475 480 

Val Asp Ala Leu Ala Ala Leu Asp Ala Lys Gly Arg Ala Ala Cys Ser 
485 490 495 



Arg Asp Leu Gly Phe Leu Gly Leu Leu Val Ala Gly Gly His Asp Val 
500 505 510 
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Thr Gly Phe Val Asp Arg Val lie Gly Pro Val Leu Ser Tyr Asp Ala 
515 520 525 

Arg Arg Leu Thr Asn Leu Arg Glu Thr Leu Gin Thr Tyr Phe Asp Ser 
530 535 540 

Ala Gly Ser Arg Thr Arg Ala Ala Glu Met Leu His Leu His Pro Asn 
545 550 555 560 

Thr Val Ser Arg Arg Leu Asp Arg He Ser Gin Leu Leu Gly Arg Asp 
565 570 575 

Trp Arg Gin Pro Asp Arg Ala Leu Asp Thr Gin Leu Ala Leu Arg Leu 



Ser Gin Glu Pro Asp Gin Pro Ala Arg Pro He Arg Arg His Arg Pro 
610 ' 615 620 

Pro Ala Ser Ala Gly Arg Ala Pro Arg Thr Pro Arg 
625 630 635 

<210> 29 
<211> 1911 
<212> DMA 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 29 

gtgcaccagg cgcaccggga cggagtggac caggccacgc tcgaccgggt gatgatcgcc 60 

aagcgactcg cgttggagct tcgagaggtc atcgggaggc ggtgtcagcg gcaggcggag 120 

ctggccgccc tcgtcgacac cgcccgtgac ctcgccgggg cgacgaacct ggaggccggg 180 

ctgcagctgg tggtgcggcg gacccaactg ctgctcgccg gggacgtggc gttcgtcagc 240 

ctcgtcgacg acgcgaccgg cgaatcctac gtcgcctcgg ccgtcggggc ggccaccgcg 300 

ctgaccagcg gctaccggct gccctggcgc gacgggctgg tcgtggccgc cgcaccgcgc 360 

gagccactct cctggacggc ggaccacctc gccgacgagc gcctcgaacg acacccggcc 420 

gccgacggcc tggtccgcgc ggaagggctg cacgcggtgc tgtccgtggt tctgagcgtc 480 

gagggccggc acctcggcaa cctgcacgtc ggccaccggc aggtccgcca cttcgccccg 540 

gacgaggtcg cgtcgctgcg cctgctcgcc gatctcgcgg cgacggcagt ggagcggatc 600 

atgctgctcg acgacacgtg ggccgaactc aagcaggccc agcaggaggc ggccagggcc 660 

cgagccgagc tgaacgcggt ccgcatggcc gaccgcctgc aacccgaact cgtccagctc 720 

atcctcgacg gcggcgaact cgacgacctg gtgggcagcg ccgtgcggcg actgggcggc 780 
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tycgtyaccy 


ggccaacgyv; 


y »-y c 1-y y ^y y 


coaccaalioa aaticcctotc 


840 


ri *^ rt a ia rvra rrr^ 


y y y o-oc uy y c 


ccyay tycyy 


nf" craaccrccc 


acaccacccrcr cccraccccrcrc 


900 


(^/T<— 1 /-» /~> 3 


ooyy CL'V^L* L.y 


yy i-yy uy l^l.c- 


r* i" CTcrncicr n p c 
^ uyy wvjy I- 


gcgccggtga cctcggctgt 


960 


y i*y t» u y u I- VrfO 


d v« y wy Gi wy ci 




cracaacicaaa 


taacaaccct accciQCQQtc 


1020 


ycycciycLwcy 


uy y i-ry w L^y V* I. 


y oi uy ciUtwcLvjvj 




accacaac ca occcrcrocGrac 


1080 


yyycuwuL.yy 


ciyyowwoy w L« 


^yy v*^oy uyy 


cccraac c t aa 


aocaaaaccrcr aaacrcQCCcxt^ 


1140 


r^r^f^ a "> 1^ a /-I 

Cy y u clijciucic 


L^uy I'wyciy l. u 


oy ^y y i.^ ^ v« 




taatcaccca ccccaaQcrcrc 


1200 


yCCcLCCuCyC 


cccggyi.gi.1;. 


cyaduyyycy 


y uw u wwy uwy 


pppapcrappt craaccfacaiiQ 


1260 


-5 ^ « /"I a r^r^ 

aayyccaucc 


y y y ^cy y c c ei 


gycyytyct-g 


^ L>y o L»vj wwv^*j 


cit.a^cicx^ci c c aaacrorccccra 

y ^y**»— yawwVi- 33*3353 wj3*3 


1320 


yCCCyyyACy 


L.ya.uycyyya. 


cit- uy ay ^yyy 


Pt* CTP ^ PCTQC C 


tacccratcac crcrccccraGrcrc 


1380 


yccyyacuyy 


L.y ^y ^cL^y y 


y y ci w t. y y w V* 


acrcpCTPaccfc 

Ot*H Ww*-! WB»wW w 


accaacraQQC ciQcccQQtcrc 


1440 


yuCyaCyucc 


tyy ucycy t-t 


y y d v^y v-y ciciy 


QCTCTP OaCf P OCT 

yyy^yyy-yy 


cctactcaca' craacctcjcrcrc 


1500 


UuCCCCyyyC 


cgc uggc eye 


wy y eyy w (^aL* 


rra orrf* na cca 
y dVafy uvjatuv^y 


at* t" t" port" pcra cccrcrcrtcatc 


1560 


yyoicccyuyc 


♦"^a^^l" a ^Ofa 

uyayCuacyo 


cycgcguuyy 


Pi" pa poa a t" p 


t"paacrcfaaac cctccaoacc 


1620 




t-y y t-y y y dy 




y (-y y v-y y ay a 


tgctigcatct gcatccgaac 


1680 


a(JCy l«y C^iJU 


gccgyutyya 




p a cr p t" CI p t ccr 


crcccracractcr crccrcrcacrcccr 


1740 


gaccgggccc 


tcgacacgca 


gctcgctctg 


cgcctgcacc 


ggatccgtgg cctgctctgc 


1800 


caggaacggg 


gctacccggg 


cccatcgcag 


gagccggacc 


aacccgcgcg gcctatccgg 


1860 


cggcaccgcc 


ctccagcatc 


cgcagggcgt 


gcgccacgga 


cgccaaggtg a 


1911 



<210> 30 
<211> 403 
<212> PRT 

<213> micromonospora carbonacea subspecies aurantiaca 
<4O0> 30 

Met Val Pro Thr Leu Asp Val Arg Glu Glu Val Thr Ala Ala Arg Ser 
15 10 15 

Asp Pro Asp Thr Val Ser Arg Phe Cys Ala Ala Leu Leu Ala Ser Leu 
20 25 30 

Pro Arg Ala Asp Gin Arg Arg Lys Gly Glu Leu Tyr Val Arg Gly Leu 
35 40 45 

Leu Thr Ala Ser Gly Arg Lys Thr Met Arg Asn Leu Ala Ala lie Ala 
50 55 60 
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Asp Asp Pro Ala Ala Ala Gin Ser Met His His Phe He Ser Cys Ser 
^5 70 75 80 

Thr Trp Asp Trp Glu Thr Val Arg Ala Ala Leu Ala Gly His Leu Asp 
85 90 95 

Arg Thr Leu Ser Pro Arg Ala Trp Val Val Arg Ser Met Leu Val Pro 
100 105 110 

Lys Thr Gly Arg His Ser Val Gly Val Glu Arg Arg Tyr Val Pro Ala 
115 120 125 

Leu Gly Glu Thr Val Asn Ser Gin Gin Ser Tyr Gly Leu Trp Leu Ala 
130 135 140 

Ser Glu Thr Val Ala Ala Pro He Asn Trp Gin Leu Ser He Gly Lys 

150 ' 155 160 

Gly Trp Leu Gin Asp Asn Arg Ala Arg Ala Ser -Val Pro Ala Asp Glu 
165 170 175 

Asp Gly Thr Thr Ser Asp Gly Ala Ala Val Gin Ala Val Leu Lys Ala 
180 185 190 

Ala Ala Trp Gly He Gly Pro Arg Pro Val Val Met Asp Ala Arg His 
195 200 205 

Ser Ala Leu Pro Pro Leu He Glu Ala Phe Thr Thr Ala Gly Leu Pro 
210 215 220 

Phe Leu Leu Arg He Asn Ser Gly Cys Thr Leu Leu Ala Ala Gly Pro 
225 230 235 ' 240 

Gly Pro Arg Glu Asn Arg Val Ala Ala Ala Ser Ala Glu His Leu Leu 
245 250 255 

Ser Leu Thr Arg Ala Gin Arg Arg Pro Val Glu Trp He Asp Pro Ala 
260 265- 270 

Ser Pro Gly Ala Arg Arg Thr Ser Leu Val Ala Pro Leu Gin Val Tyr 
275 280 "285 

Trp Pro Gly Leu Ser Gly Ala Arg Pro Pro Gly Pro Ser Ala Pro Ala 
290 295 300 

Pro Pro Gly Ala Ala Arg Ala Ala Ala Pro Gly Leu Pro Leu Thr Leu 
305 310 315 320 

Leu Gly Lys Trp Gin Thr Tyr Glu Arg Gly Val Arg Gin Met Trp Leu 
325 330 335 

Thr Asn Met Thr Asp Ala Gly Tyr Gly Pro Leu Leu Arg Leu Ser Lys 
340 345 350 



Leu Thr Arg Arg Val Glu Thr Asp Phe Ser Gin Val Ser Leu Asp Val 
355 360 365 
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Gly lie Gin Asp Phe Glu Gly Arg Ser Tyr Gin Gly Trp His Arg His 
370 375 380 

Val Thr Leu Ala Ser Val Ala His Ala Leu Arg Met Leu Glu Gly Gly 
385 390 395 400 

Ala Ala Gly 



<210> 31 

<211> 1212 
<212> . DNA 

<213> micromonospora 


carbonacea 


subspecies 


aurantiaca 






<400> 31 
atggtgccga 


cgctcgacgt 


ccgcgaggag 


gtgaccgcgg 


caaggtccga 


tccggacacc 


60 


gtgtcccggt 


tctgcgccgc 


cctgctggcc 


tcgctgcccc 


gcgccgacca 


gcgacgcaag 


120 


ggcgaactgt 


acgtccgggg 


gctgctgacc 


gcctccggcc 


gcaagaccat 


gcgcaacctg 


180 


gccgccatcg 


ccgacgatcc 


ggcggcggca 


cagagcatgc 


accacttcat 


cagttgctcc 


240 


acctgggact 


gggagaccgt 


ccgtgccgcg 


ctcgccggcc 


acctggaccg 


gacgctgtcg 


300 


ccccgggcct 


gggtggtgcg 


gtcgatgctg 


gtgccgaaga 


ccggccggca 


ctcggtcggc 


360 


gtggaacgcc 


ggtacgtgcc 


cgcgctgggc 


gagacggtca 


acagccagca 


gagctacggc • 


420 


ctctggctgg 


cctcggagac 


cgtcgccgcg 


cccatcaact 


ggcagttgtc 


catcggtaag 


480 


ggttggctcc 


aggacaaccg 


cgcccgcgcg 


agcgtaccgg 


cggacgagga 


cggcacgacc 


540 


agcgacggcg 


cggcggtgca 


ggcggtgctg 


aaggccgcgg 


cctggggaat 


cggccctcgc 


600 


ccggtggtaa 


tggacgcccg 


gcactcggcg 


ctgcccccgc 


tgatcgaggc 


gttcaccacg 


660 


gcgggtctgc 


ccttcctgct 


acggatcaac 


agcggctgca 


ccctgctggc 


cgccgggccc 


720 


ggcccgcgcg 


agaaccgggt 


cgcggcggcc 


tccgccgagc 


acctgctcag 


cctgacgcgg 


780 


gcccagcgcc 


gtccggtgga 


gtggatcgac 


ccggccagcc 


ccggcgcacg 


gcgcacgagc 


840 


ctggtcgcac 


cgctacaggt 


ctattggccg 


ggcctgtccg 


gtgcccgccc 


gcccggtccg 


900 


tccgccccgg 


ccccgccggg 


ggcggcgcgc 


gccgccgcgc 


ccgggctgcc 


cctgacactg 


960 


ctcggcaagt 


ggcagaccta 


cgagcgcggc 


gtacggcaga 


tgtggctgac 


caacatgacc 


1020 


gacgccgggt 


acggcccact 


gctgcggctg 


agcaagctca 


cccggcgggt 


cgagaccgac 


1080 


ttctcccagg 


tcagcctcga 


cgtcggcatc 


caagacttcg 


agggtcggtc 


ataccaaggc * 


1140 


t99caccggc 


acgtcacctt 


ggcgtccgtg 


gcgcacgccc 


tgcggatgct 


ggagggcggt 


1200 


gccgccggat 


ag 
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<210> 32 

<211> 481 

<212> PRT 

<213> micromonospora carbonacea subspecies aurantiaca 

<400> 32 



Met Thr Ser Ala Ala His His Ser Pro His Pro Ala Lys Ala Asp Ala 
^5 10 15 

Leu Met Asp Asp Ala His Ala Asp He Gly Ala Asp Ala Glu Ala Asp 
20 25 30 

Gly Arg Arg Leu Asp Arg Ala Ala Leu Arg Arg Val Ala Gly Leu Ser 
35 40 45 

Thr Glu Arg Ala Asp Val Thr Glu Val Glu Tyr Arg Gin Val Arg Leu 
50 55 60 

Glu Arg Val Val Leu Val Gly Val Trp Thr Ser Gly Thr Ala Asp Glu 



65 



70 



75 



80 



Ala Glu Arg Ser Leu Ala Glu Leu Ala Ala Leu Ala Glu Thr Ala Gly 
85 90 95 

Ala Val Val Leu Asp Gly Val He Gin Arg Arg Asp Arg Pro Asp Pro 
100 105 110 

Ala Thr Tyr He Gly Ser Gly Lys Ala Arg Glu Leu Arg Asp He Val 
115 120 125 

Gin Glu Val Gly Ala Asp Thr Val He Cys Asp Gly Glu Leu Ser Pro 
130 135. 140 

Ala Gin Leu Val Arg Leu Glu Glu Val Val Asp Ala Lys Val Val Asp 

150 155 160 

Arg Thr Ala Leu He Leu Asp He Phe Ala Gin His Ala Thr Ser Arg 
165 170 175 

Glu Gly Lys Ala Gin Val Ala Leu Ala Gin Met Gin Tyr Met Leu Pro 
180 185 190 

Arg Leu Arg Gly Trp Gly Gin Ser Leu Ser Arg Gin Met Gly Gly Gly 
195 ' 200 205 

Ala Gly Gly Gly Gly Met Ala Thr Arg Gly Pro Gly Glu Thr Lys He 
210 215 220 

Glu Thr Asp Arg Arg Arg He His Glu Arg Met Ala Arg Leu Arg Arq 
"5 230 235 240 

Glu He Ala Glu Met Lys Ser Gly Arg Glu Leu Lys Arg Arg Asp Arg 
245 250 255 



- 137 - 



wo 03/010193 



PCT/CA02/01177 



Arg Arg Asn Ser Val Pro Ser Val Ala lie Ala Gly Tyr Thr Asn Ala 
260 265 270 

Gly Lys Ser Ser Leu Leu Asn Arg Leu Thr Gly Ala Ser Val Leu Val 
275 280 285 

Gin Asn Ala Leu Phe Ala Thr Leu Asp Pro Thr Val Arg Arg Ala Thr 
290 295 300 

Thr Pro Ser Gly Arg Ser Tyr Thr He Thr Asp Thr Val Gly Phe Val 
305 310 315 320 

Arg His Leu Pro His His Leu Val Glu Ala Phe Arg Ser Thr Leu Glu 
325 330 335 

Glu Val Ala Glu Ala Asp Leu Leu Leu His Val Val Asp Gly Ala His 
340 345 350 

Pro Ala Pro Leu Glu Gin Leu Ala Ser Val Arg Ala Val He Arg Asp 
355 360 365 

Val Asp Ala Ala Gly Val Pro Glu Leu Val Val He Asn Lys Ala Asp 
370 375 380 

Ala Ala Thr Pro Ala Ala Leu Ala Ala Leu Ala Glu Ala Glu Pro His 
385 390 395 400 

His Val Val Val Ser Ala Arg Thr Gly Gin Gly He Asp Thr Leu Arg 
405 410 415 

Gin Leu Leu Glu Ala Ala Leu Pro His Arg Glu Val Arg Val Asp Val 
420 425 430 

Leu He* Pro Tyr Val Ala Gly Ser Leu Val Ala Arg Val His Ala Asp 
435 440 445 

Gly Glu Val Leu Ala Glu Glu His Thr Ala Asp Gly Thr Leu Leu Gin 
450 455 460 

Ala Arg Val Ala Pro Asp Leu Ala Ala Glu Leu Ser Ala Tyr Ala Arg 
465 470 475 480 



Thr 



<210> 33 
<211> 1446 
<212> DNA 

<213> micromonospora carbonacea subspecies aurantiaca 
<400> 33 

atgacctccg cagcgcacca ttccccgcat ccggcgaagg ccgacgccct gatggacgac 
gcccacgccg acatcggggc cgatgccgag gccgacggtc gacggctcga ccgggccgcc 
ctgcggcggg tcgccgggct gtcgaccgag agggccgacg tcacggaggt cgagtaccgg 
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caggtgcggc tggagcgcgt cgtcctggtc ggcgtgtgga cctcgggcac cgccgacgag 240 

gccgaacggt ccctcgccga gctggcggca ctcgccgaga ccgcgggagc cgtggtgctc 300 

gacggggtga tccagcgccg cgaccggccc gacccggcga cgtacatcgg ctccggcaag 360 

gcgcgggagt tgcgggacat cgtccaggag gtgggggccg acacggtgat ctgcgacggt 420 

gagctgagcc cggcccaact ggtacgcctc gaagaggtcg tcgacgccaa ggtggtggac ' 480 

cgcaccgcgc tgatcctcga catcttcgcc cagcacgcca cgtcccgcga ggggaaggcg 540 

caggtggccc tggcacagat gcaatacatg ctgccgcggc tgcgcggctg gggccagtcg 600 

ctctcccggc agatgggcgg aggtgccggc ggcggtggca tggccacccg ggggcccggc 660 

gagaccaaga tcgagaccga ccggcggcgc atccacgaga ggatggcccg gctccgacgg 720 

gagatcgcgg agatgaagtc cggccgcgaa ctcaagcgcc gcgatcggcg gcgcaacagc 780 

gtcccgtcgg tcgcgatcgc cggttacacc aacgccggca agtcctcgct gctcaaccgg 840 

ctcactggcg cgagcgtgct ggtgcagaac gcgctgttcg ccaccctcga cccgacggtg 900 

cgccgggcca ccaccccgag cgggcgcagc tacacgatca ccgacaccgt cggattcgtc 960 

cggcacctgc cgcaccacct ggtggaggcg ttccgctcca ccctggaaga ggtggccgag 1020 

gccgacctcc tgctgcacgt ggtggacggc gcccaccccg ccccgctgga gcagctcgcc 1080 

tcggtgcgcg cggtcatccg ggacgtggac gcggcgggag tgcccgaact cgtcgtgatc 1140 

aacaaggccg acgccgccac cccggccgcc ctggccgcgt tggcggaggc cgagccgcac 1200 

cacgtcgtcg tctcggcccg caccggtcag ggcatcgaca cgcttcggca gttgctggag 1260 

gccgcgctgc cgcaccggga ggtccgggtc gacgtcctga tcccgtacgt cgcgggcagc 1320 

ctcgtggccc gggtgcacgc cgacggcgag gtgctggccg aggagcacac ggccgacggc ' 1380 

accctgctgc aggcgcgggt ggcccccgac ctggctgccg agctcagcgc gtacgccagg 1440 

acctga ^^^^ 

<210> 34 
<211> 408 
<212> PRT 

<213> micromonospora carbonacea siabspecies aurantiaca 
<400> 34 

Met Lys Arg Asp Leu Gly Asp Leu Ala Leu Phe Gly Gly His Ala Ser 
15 10 15 

Phe Leu Gin Gin He His Val Gly Arg Pro Asn Arg He Asp Arg Ala 
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Arg Leu Phe Asp Arg Leu Ser Trp Ala Leu Asp Asn Glu Trp Leu Thr 
35 40 45 

Asn Asn Gly Pro Leu Ala Arg Glu Phe Glu Glu Arg Val Ala Asp Met 
50 55 60 

Val Gly Val Gly Asn Cys Val Ala Thr Cys Asn Ala Thr Val Ala Leu 
65 70 75 80 

Gin Leu Leu Ala His Ala Thr Glu Leu Thr Gly Glu Val lie Met Pro 
85 90 95 

Ser Leu Thr Phe Ala Ala Thr Ala His Ala Val Arg Trp Leu Gly Leu 
100 105 110 

Glu Pro Val Phe Cys Asp lie Asp Pro Arg Thr Gly Cys Leu Asp His 
115 120 125 

Val Ala Val Ala Ala Ala He Thr Pro Arg Thr Ser Ala Val Phe Gly 
130 135 140 

Val His Leu Trp Gly Arg Pro Cys Asp Val Asn Ala Leu Glu Lys Val 
145 150 155 • 160 

Thr Ma Asp Ala Gly Leu Arg Leu Phe Phe Asp Ala Ala His Ala He 
165 170 175 

Gly Cys Thr Ser Gin Gly Arg Pro Val Gly Arg Phe Gly His Ala Glu 
180 185 190 

Val Phe Ser Phe His Ala Thr Lys Val Val Asn Ala Phe Glu Gly Gly 
195 200 205 

Ala He Val Thr Asp Asp Asp Asp Leu Ala His Arg Val Arg Ser Leu 
210 215 220 

Ala Asn Phe Gly Phe Gly Leu His Ser Pro Ser Ala Ala Gly Gly Thr 
225 230 235 240 

Asn Ala Lys Met Ser Glu Ala Ser Ala Ala Met Gly Leu Thr Ser Leu 
245 250 255 

Asp Ala Phe Pro Glu Val Ala Arg His Asn Gin Ala Asn Tyr Glu Gin 
260 265 270 

Tyr Cys Gly Glu Leu Ala Arg He Pro Gly Leu Ser Val He Asp Phe 
275 280 . 285 

Ala Pro Asp Glu Arg His Asn Tyr Gin Tyr Val He Val Glu He Asp 
290 295 300 

Pro Asp Val Thr Gly Leu His Arg Asp Leu Leu Val Asp Leu Leu Arg 
305 310 315 320 

Ala Glu Asn Val Val Ala Gin Arg Tyr Phe Ser Pro Ala Cys His Gin 
325 330 335 
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Leu Glu Pro Tyr Arg Ser Arg Gin Gin Phe Gin Leu Pro His Thr Glu 
340 345 350 

Arg Leu Ser Ala Arg Val Leu Ala Leu Pro Thr Gly Ser Ala lie Ser 
355 360 365 

Arg Glu Asp lie Arg Arg Val Cys Asn He Val Arg Leu Ala Val Ser 
370 375 380 

Arg Gly Phe Glu Leu Thr Ala Arg Trp Gin Gin Gin Pro Gly Pro Asp 
385 390 395 400 

Gly Gin Ser Val Val Ala Pro Gly 
405 



<210> 35 
<211> 1227 
<212> DNA 

< 2 1 3 > mi cromonospor a 


carbonacea 


subspecies aurantiaca 




<400> 35 
atgaagcgag 


atctcgggga 


tctggcactc 


ttcggaggac acgccagctt cctccagcag 


60 


atccacgtcg 


ggcgccccaa 


ccggatcgat 


cgggccaggc tgttcgaccg gctgtcctgg 


120 


gcgctcgaca 


acgagtggtt 


gaccaacaac 


gggccgctgg cacgggagtt cgaggagcgg ■ 


180 


gtcgccgaca 


tggtcggggt 


cggcaactgc 


gtggcgacgt gcaacgccac ggtggccctc 


240 


cagctgctcg 


cgcacgccac 


cgagctgacc 


ggtgaggtga tcatgccatc gctcaccttc 


300 


gccgcgaccg 


cacacgcggt 


gcgctggctc 


gggctggagc cggtcttctg cgacatcgac 


360 


ccgcgcaccg 


gatgcctcga 


ccacgtggcg 


gtcgccgcgg ccatcacgcc gcgcacgtcg 


420 


gcggtcttcg 


gcgtccacct 


ctggggccgc 


ccctgcgacg tcaacgcgct ggagaaggtg 


480 


accgccgacg 


cgggcctgcg 


cctgttcttc 


gacgccgccc acgccatcgg gtgcacctca 


540 


cagggccgcc 


cggtggggcg 


gttcggccac 


gccgaggtgt tcagcttcca cgcgacgaag 


600 


gtcgtcaacg 


ccttcgaggg 


cggggcgatc 


gtcaccgacg acgacgacct cgcccaccgc 


660 


gtccgctccc 


tggcgaactt 


cggcttcggc 


ctgcacagcc ccagcgcggc cggcggcacc 


720 


aacgcgaaga 


tgagcgaggc 


gtccgccgcc 


atggggctca cctcgctcga cgcgttcccc 


780 


gaggtggccc 


gccacaacca 


ggccaactac 


gagcagtact gcggtgagct ggcccggatt 


840 


cccggcctca 


gcgtgatcga 


cttcgccccc 


gacgagcggc acaactacca gtacgtgatc " 


900 


gtcgagatcg 


acccggacgt 


caccgggttg 


caccgcgacc tgctcgtcga cctgctccgg 


960 


gccgagaacg 


tcgtggcgca 


gcgctacttc 


tcgccggcct gtcaccaatt ggagccctac 


1020 


cggtcccggc 


agcagttcca 


gctgccgcac 


accgagcggc tctcggcgcg cgtcctggcg 


1080 
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ctgccgaccg gctccgccat ctcccgggaa gacatccgca gggtgtgcaa catcgtgcgg 1140 
ttggcggtct cccggggatt cgaattgacc gctcggtggc agcagcagcc cgggcccgac 1200 
ggacagagcg tggtggcacc cggttga 1227 



<210> 36 

<211> 488 

<:212> PRT 

<213> micromonospora carbonacea siabspecies aurantiaca 

<400> 36 

Val Gly Gly Pro Val Thr Met Glu He Ser Ala Ser Asn Pro Val Ala 
1 5 10 . 15 

Thr Cys Ala Val Pro Gly Ser Asp Pro Thr Ala Ala Ala Arg Val Leu 
20 25 30 

Tyr Asp Glu Val Ala Gly Ser Gly He Val Pro Pro Ala Glu He Gly 
35 40 45 

Ala Ala Ala Gin Gly Leu Val Ala Leu Ala Arg He Tyr Gly Thr Thr 
50 55 60 

Pro Phe Leu Pro Leu Glu Gin Ala Arg Arg Glu He Gly Leu Asp Arg 
65 70 • 75 80 

Ala Gly Phe Gly Arg Leu Leu Asp Leu Phe Ala Arg He Pro Gly Leu 
85 90 95 

Arg Thr Ala Val Glu Asn Gly Pro Ser Gly Arg Tyr Trp Thr Asn Thr 
100 105 110 

Val Leu Gly Leu Glu Arg Ala Gly Val Phe Asp Ala Val Leu Asp Arg 
115 120 125 

'Arg Pro Ala Phe Pro His Leu Val Gly Leu Tyr Pro Gly Pro Thr Cys 
130 135 140 

Met Phe Arg Cys His Phe Cys Val Arg Val Thr Gly Ala Arg Tyr Gin 
145 150 155 160 

Ala Ser Ala Leu Asp Asp Gly Asn Ala Met Phe Ala Ser Val He Asp 
165 170 175 

Glu Val Pro Ala His Asn Arg Asp Ala Val Tyr Val Ser Gly Gly Leu 
180 185 190 

Glu Pro Leu Thr Asn Pro Gly Leu Gly Ala Leu Val Ser Arg Ala Ala 
195 200 205 

Glu Arg Gly Phe Arg He He Leu Tyr Thr Asn Ser Phe Ala Leu Thr 
210 215 220 
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Glu Gin Lys Leu Lys Gly Glu Arg Gly Leu Trp Ser Leu His Ala lie 
225 230 235 240 

Arg Thr Ser Leu Tyr Gly Leu Asn Asp Glu Glu Tyr Arg Ala Thr Thr 
245 250 255 

Gly Lys Gin Gly Ala Phe Thr Arg Val Arg Ala Asn Leu Thr Arg Phe 
260 265 270 

Gin Gin Leu Arg Ala Glu Arg Gly Glu Pro Val Arg Leu Gly Leu Ser 
275 280 285 

Tyr lie Val Leu Pro Gly Arg Ala Gly Arg Leu Ser Ala Leu He Asp 
290 295 300 

Phe Val Ala Glu Leu Asn Glu Ala Ala Pro Asp Arg Pro Leu Asp Tyr 
305 310 315 320 

He Asn Leu Arg Glu Asp Tyr Ser Gly Arg Pro Asp Gly Lys Leu Ser 
325 330 33*5 

Leu Asp Glu Arg Ala Glu Leu Gin Ala Glu Leu His Arg Phe Arg Glu 

340 345 350 

Arg Ala Met Gin Arg Thr Pro Thr Leu His He Asp Tyr Gly Tyr Ala 
355 360 365 

Leu His Ser Leu Met Thr Gly Ser Asp Val Glu Leu- Val Arg He Arg 
370 375 380 

Pro Glu Thr Met Arg Pro Ala Ala His Pro Gin Val Ser Val Gin Val 
385 390 395 400 

Asp He Leu Gly Asp Val Tyr Leu Tyr Arg Glu Ala Ala Phe Pro' Gly 
405 410 415 

Leu Ala Gly Ala Asp Arg Tyr Arg He Gly Thr Val Ser Pro Gly Thr 
420 425 430 

Thr Leu Ala Gin Val Val Glu Thr Phe Val Thr Ser Gly Gly Ser Val 
435 440 445 

Val Ala Lys Pro Gly Asp Glu Tyr Phe Leu Asp Gly Phe Asp Gin Ala 
450 455 460 

Val Thr Ala Arg Leu Asn Gin Met Glu Thr Asp Val Ala Asp Gly Trp 
465 470 475 480 

Gly Asp Arg Arg Gly Phe Leu Arg 
485 

<210> 37 
<211> 1467 
<212> DNA 

<213> microtnonospora carbonacea subspecies aurantiaca 
<400> 37 
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gtgggagggc 


ccgtgaccat 


ggagatctcc 


gcctcgaatc 


ccgtggcgac 


ctgcgctgtc 




cccggcagcg 


acccgaccgc 


ggcggcgcgc 


gtgctgtacg 


acgaggtcgc 


cgggtcagga 




atcgtgccgc 


cggcagagat 


cggggccgcc 


gcccaggggt 


tggtggcatt 


ggcacgcatc 




tacgggacca 


cacctttuCt 


gccgcttgag 


caggcccgcc 


gcgaaatcgg 


cctggaccgg 




gccgggttcg 


ggcggctgct 


ggacctgttc 


gcccggattc 


ccgggttgcg 


caccgcagtg 


1 A A 


gagaacggac 


cgtccggtcg 


ctactggacc 


aacacggtgc 


tcggcctcga 


aagggccggc 




gtcttcgacg 


ccgtgctcga 


ccggaggccg 


gcgtttccgc 


atctcgtcgg 


gctctacccg 


420 


ggccccacgt 


gcatgttccg 


ctgtcacttc 


tgcgtaaggg 


tcaccggggc 


ccgctaccag 


480 


gcctcggcgc 


tggacgacgg- 


gaacgccatg 


ttcgcctctg 


tcatcgacga 


ggtccccgcg 


C A f\ 

540 


cacaaccgcg 


acgcggtgta 


cgtctccggt 


ggcctcgagc 


cactcaccaa 


ccccgggctc 


CA A 

oUO 


ggtgcactgg 


tcagccgggc 


ggccgagcgg 


ggatttcgga 


ccacccticiia 


caccaactcg 


660 


ttcgccctca 


cggagcagaa 


gctcaagggt 


gagcggggat 


tgtggagcct 


gcacgccatc 


r7 O A 

720 


cgcacgtcgc 


tgtacgggtt 


gaacgacgag 


gaataccggg 


cgaccaccgg 


caagcagggg 


*-j a (\ 
780 


gccttcaccc 


gggtacgggc 


gaacctcacg 


cgglitccagc 


agctgcgtgc 


cgagcggggc 


Q A A 

o40 


gagccggtgc 


ggctcggcct 


cagctacatc 


gtcctgcccg 


gc cgcgc egg 


gcggc t gage 


Q A A 


gcgctgatcg 


acttcgtcgc 


cgagctcaac 




cggaccgccc 


gctggactac 


O f A 

you 


atcaacctgc 


gggaggacta 


cagcgggcgg 


ccggacggga 


agctctccct 


ggacgagcgc 


T A O A 
X020 


gccgagctcc 


aggccgagct 


gcaccggttc 


cgggagaggg 


caatgcagcg 


gacgccgacc 


1080 


ctgcacatcg 


actacggcta 


cgccctgcac 


agcctgatga 


cgggaagcga 


cgtggagctc 


1140 


9tgcgtatcc 


ggccggagac 


gatgcgccct 


gcggcccacc 


cgcaggtgtc 


ggtgcaggtg 


1200 


gatatcctcg 


gtgatgtcta 


cctctatcgg 


gaggcggcgt 


ttccgggcct 


ggccggtgcc 


1260 


gaccgctatc 


gcatcggcac 


ggtatctccc 


ggcacgacgt 


tggcgcaggt 


ggtggagacg 


1320 


ttcgtgacca 


gcggcggatc 


ggtggtcgcg 


aagcctggcg 


acgaatactt 


cctggacgga 


1380 


ttcgaccagg 


cggtgaccgc 


gcggctgaac 


cagatggaga 


ccgacgtcgc 


cgatggctgg 


1440 


ggagaccgac 


ggggtttcct 


ccgctga 








1467 


<210> 38 
<211> 277 
<212> PRT 

<2 1 3 > micromonospora 


carbonacea 


subspecies 


aurantiaca 
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Met Pro Tyr He Gin His Ala Gly Arg His Glu Phe Gly Gin Asn Phe 
1 5 10 15 

Leu Val Asp Arg Ser Val He Asp Asp Phe Val Glu Leu Val Ala Arg 
20 25 30 

Thr Asp Gly Pro He Val Glu He Gly Ala Gly Asp Gly Ala Leu Thr 
35 40 45 

Leu Pro Leu Ser Arg Gin Gly Arg Glu Leu Thr Ala Val Glu He Asp 
50 55 60 

Ser Lys Arg Ser Lys Arg Leu Ser Arg Gin Thr Pro Asp Asn Val Thr 
65 70 75 80 

Val Val Cys Ala Asp Val Leu Ser Phe Arg Phe Pro Gin His Pro His 
85 90 95 

Val Val Val Gly Asn He Pro Phe His Val Thr Thr Pro He Val Arg 



Trp Glu Val Ala Arg Arg Arg Ala Gly Val Gly Gly Ala Thr Leu Leu 
130 135 140 

Thr Ala Ser Trp Trp Pro Trp Tyr Asp Phe Glu Leu His Ser Arg Val 
145 150 155 160 

Pro Ala Arg Ala Phe Arg Pro Val Pro Ser Val Asp Gly Gly Leu Phe 
165 170 175 

Ser Met Val Arg Arg Gly Thr Pro Leu Val Asp Asp Arg Arg Gly Tyr 
180 185 . 190 

Gin Glu Phe Val Arg Leu Val Phe Thr Gly Lys Gly His Gly Leu Pro 
195 200 205 

Glu He Leu Glii Arg Thr Gly Arg He Ala Arg Lys Asp Gin Gin Asp 
210 215 220 

Tip Gin Arg Ala Asn Arg Val Gly Pro Gin His Leu Pro Lys Asp Leu 
225 230 235 240 

Thr Ala His Gin Trp Ala Ser Leu Trp His Leu Val Ala Pro Ala Arg 
245 250 255 

Pro Ala Gly Pro Arg Arg Pro Ala Pro Arg Arg Pro Gly Ser Pro Ala 
260 265 270 

Ser Ala Arg Arg Arg 
275 

<210> 39 
<211> 834 



100 



105 



110 



Ala Leu Leu Ala Ala Asp 
115 



His Trp His Thr Ala Val Leu Leu Val Gin 
120 125 
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<400> 39 
atgccctaca 


tccagcacgc 


cgggcgacat 


gaattcggcc 


agaatttcct 


ggtcgaccgc 


60 


tcggtgatcg 


acgatttcgt 


cgaactcgtc 


gcccggaccg 


acggccctat 


cgtggagatc 


120 


ggcgccggcg 


acggtgcgct 


gaccctaccc 


ctgagccggc 


agggaaggga 


gttgaccgca 


180 


gtggagatcg 


actccaagcg 


ttccaagcgg 


ctcagccggc 


agacacccga 


caacgtcacc 


240 


gtggtctgcg 


cggatgtcct 


gagcttccgg 


ttcccccagc 


atccgcacgt 


ggtcgtcggg 


300 


aacatcccct 


tccacgtgac 


cacccccatc 


gtgcgggctc 


tcctcgccgc 


ggaccactgg 


360 


cacacggcgg 


tgctgctggt 


gcagtgggag 


gtggcccgca 


ggcgggccgg 


cgtcggcggc 


420 


gcgacgctgc 


tgaccgcgag 


ctggtggccc 


tggtacgact 


tcgaactgca 


ctcccgggtt 


480 


ccggcccgcg 


ccttccggcc 


tgtcccttcc 


gtcgacggcg 


ggctgttctc 


catggtccgt 


540 


cgcgggaccc 


cgctggtcga 


cgaccggagg 


ggttaccagg 


aattcgtccg 


gctggtgttc 


600 


accggcaagg 


ggcacggatt 


gccggagatc 


cttcagcgga 


ccgggcggat 


cgcccgcaag 


660 


gaccagcagg 


actggcaacg 


ggccaaccgg 


gtggggccgc 


agcacctgcc 


caaggacctg 


720 


accgcccacc 


agtgggcctc 


cctgtggcac 


ctggtggcac 


ccgcccggcc 


ggccggcccc 


780 


cgccgtccgg 


caccgcgccg 


gccaggaagc 


cccgcttcgg 


cgcgccggcg 


ctga 


834 
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